fdemmer / django-weasyprint

A Django class-based view generating PDF resposes using WeasyPrint
Other
352 stars 53 forks source link

Support for CDN statics #47

Closed n1ngu closed 2 years ago

n1ngu commented 3 years ago
  1. Thanks for this awesome software
  2. I am only using the custom URL fetcher outside any django views to imperatively render a pdf in a background process, that will later be emailed etc. So maybe I am missing something, please tell me so.

As explained in https://docs.djangoproject.com/en/3.0/howto/static-files/deployment/#serving-static-files-from-a-cloud-service-or-cdn statics might be served from an external domain. In this situation, URLs produced in templates due to blocks like

<img src="{% static 'logo.png' %}" />

will already start with a schema and will not be prefixed by file:// even if that is the explicit base_url, thus skipping the whole advantatge of your URL fetcher. URLs can neither be hardcoded like file:///static/example.png because their path won't fulfill the condition of matching settings.STATIC_URL.

Right now I am working around this with a custom URL fetcher like

def custom_url_fetcher(url, *args, **kwargs):
    if url.startswith(settings.STATIC_URL):
        file_path = url.replace(settings.STATIC_URL, '', 1)
        file_obj = open(find(file_path), 'rb')
        data = {
            'file_obj': file_obj
        }
        return data

    return django_weasyprint.utils.django_url_fetcher(
        url, *args, **kwargs)

but was wondering if you feel like this could belong to this package. I think it is safe to assume that any reference to a file starting with the full STATIC_URL is something that has actually been collected by django and thus can still be found by the configured STATICFILES_FINDERS with no network calls.

This would also help avoiding some issues with static file versioning (static files bundled with the running django installation not yet collected and published to an accessible CDN) and picky network configurations (e.g. statics being collected in a CI pipeline but webservers being constrained to whitelisted services that exclude the very CDN :woman_shrugging: ).

matthijskooijman commented 2 years ago

Looking at the default url fetcher, it actually looks like the code is already intended to resolve STATIC_URL in the way you suggest, but it is written in a way that only acts on file:// urls, not anything else. I suspect there's a bug there, it looks a bit like not all of the code should be inside the outer if.

In particular, the code looks roughtly like this:

if url.startswith('file:'):
        mime_type, encoding = mimetypes.guess_type(url)
        url_path = urlparse(url).path
        ... do stuff ...

        default_media_url = settings.MEDIA_URL in ('', get_script_prefix())
        if not default_media_url and url_path.startswith(settings.MEDIA_URL):
            ... do stuff ...

        elif settings.STATIC_URL and url_path.startswith(settings.STATIC_URL):
            path = url_path.replace(settings.STATIC_URL, '', 1)    
            ... do more stuff ...

This seems to only match file: urls, then replace the url_path variable with the actual path inside the url, and then match that path against MEDIA_URL and STATIC_URL. That seems weird, I would expect the code to match the original url passed against MEDIA_URL and STATIC_URL (i.e. match urls against urls, not a path against urls).

I'm also a bit surprised about how STATIC_URL is handled, replacing it with '' rather than something like STATIC_ROOT (or maybe something more complex, since in DEBUG runserver, you usually do not run collectstatic to collect files in STATIC_ROOT, so this should probably ask django to find the file (using django.contrib.staticfiles.finders.find(filename)) when in DEBUG mode? edit: I did not look well enough, this is already implemented.

Also, I suspect that it would be good to maybe give pdf_stylesheets a similar treatment (in DEBUG remove STATIC_ROOT and pass through django.contrib.staticfiles.finders.find()), so the default example of just using `settings.STATIC_ROOT + '/css/foo.css' works in runserver)?

n1ngu commented 2 years ago

[...] in DEBUG runserver, you usually do not run collectstatic to collect files in STATIC_ROOT, so this should probably ask django to find the file (using django.contrib.staticfiles.finders.find(filename)) when in DEBUG mode?

As long as static files are served from the same webserver as the django site, which is very common when in debug mode (although not mandatory), precisely the block code you pointed would take care of this with the data['file_obj'] = open(find(path), 'rb') instruction?

Also, I suspect that it would be good to maybe give pdf_stylesheets a similar treatment (in DEBUG remove STATIC_ROOT and pass through django.contrib.staticfiles.finders.find()), so the default example of just using `settings.STATIC_ROOT + '/css/foo.css' works in runserver)?

AFAIU css urls are resolved with the same url_fetcher by weasyprint so there is no need to make a distinction for css nor give a special treatment to the debug mode?

Aside from that, it's 100% what you said: urls should be matched against urls, not paths.

matthijskooijman commented 2 years ago

As long as static files are served from the same webserver as the django site, which is very common when in debug mode (although not mandatory), precisely the block code you pointed would take care of this with the data['file_obj'] = open(find(path), 'rb') instruction?

Oh, I totally missed that the path was passed to find() there. So you are right, ignore that part of my comment, it is indeed already implemented.

AFAIU css urls are resolved with the same url_fetcher by weasyprint so there is no need to make a distinction for css nor give a special treatment to the debug mode?

Ah, that would make sense, I had not tried that.

fdemmer commented 2 years ago

Hi @n1ngu, thank you for your kind words and patience :)

After lots of experimenting I came to the conclusion, that this should not be part of the code this package provides. As you wrote yourself, your usecase is to use the URL fetcher outside a Django view context, which is not the primary problem this package tries to solve. However it is a usecase you seem to have, so at least it should not stand in the way...

A few thoughts and suggestions: