Open jbiggar opened 4 years ago
Thanks, this is an interesting question. The short answer is: no, you're not using the tools incorrectly and you're not missing any config options; this is just a situation I didn't have in mind when I originally designed Whitenoise. In most deployments there's a relatively limited number of static files and they live alongside the rest of the application code which is usually fast to access.
A lazy initialisation option wouldn't actually solve the fundamental problem with autorefresh mode which isn't the overhead it adds to serving static files (an extra stat isn't the end of the world here) but the overhead it adds to every other request to the application where it has to hit the disk to confirm that there isn't a corresponding static file. Because Whitenoise needs to be able to serve files from arbitrary URLs (e.g. /robots.txt
, favicon.ico
) it operates on the entire URL space of the application, not just a /static
prefix.
Probably the best solution here would be to pre-generate and cache the self.files
dict which Whitenoise builds on initialisation. This contains some custom types so you'd need to use pickle rather than JSON, but I'm pretty sure everything in here would serialize and deserialize without issue. Obviously this adds a bit of complexity and an extra step to your build process, but I think it would solve the problem.
Thanks for the thoughtful response @evansd.
Do you think a build/deployment-time-regenerated and initialisation-time-loaded pickle file cache is something you'd want to support in the main project? We'll probably head down the path you suggest, but wouldn't bother with a full PR to the upstream project if that's not a direction you'd want to head.
I wouldn't be totally opposed to adding support for it, but it would depend on how much complexity it adds and, specifically, how well-isolated that complexity is from everything else. This feels like one of those situations where adding basic support which works but with a few rough edges would be relatively easy, whereas getting something really solid which handles all the edge cases would be a lot more work.
I'd certainly be happy to reference your use case in the documentation though and link to any code you have. I'd say, get something that works for you by subclassing the middleware and then we can decide from there whether it looks worth trying to include in the main project.
I would love to piggyback on a build/deployment-time-regenerated and initialisation-time-loaded pickle file cache as a mode that WhiteNoise supports 👀 .
I've put together an approach that seems to work in a Gist.
I put the pickle generation logic in a new Storage class to piggy-back off collectstatic
. The pickle file with Whitenoise metadata didn't feel that different conceptually from the manifest file created by the stock Django ManifestStaticFilesStorage
.
The inheritance chain wasn't totally clean:
WhiteNoiseMiddlewareFileLoader
subclass in a separate module (base.py
) that just implements that methodCompressedManifestStaticFilesStorage
meant a slightly odd parent/mixin class arrangement. I would guess that moving much of the CompressedManifestStaticFilesStorage
logic to a mixin class would take care of that.Hope you find it useful. Feel free to take what I wrote as a starting point. Feedback welcome, and let me know if there are things I can do to make it easier to incorporate into or at least work with the main project.
Just wanted to mention that I am interested in finding a solution to this issue. I am the maintainer of Tutor, a wrapper distribution around Open edX; Open edX is a very complex Django project which relies on many static files. With Tutor we made the decision a couple releases ago to use Whitenoise for serving static assets, and we've been happy with it. The only issue is that it takes a few seconds before the gunicorn server is actually able to serve requests because of the file caching in Whitenoise -- not a deal-breaker, but we could definitely use a build-time cache generation step when building our Docker images.
I am also hitting a similar issue as described by the OP using EFS with lots of static files
@jbiggar Did you use this solution in production? Are you happy with it?
@evansd Is the proposed implementation acceptable for inclusion with whitenoise?
@thenewguy We aren't (yet) using my suggested code in production, just test environments, though it is still our intention to use it in production.
Our experience has been that the resource overhead of setting WHITENOISE_AUTOREFRESH = True
in production (to avoid the startup overhead at the expense of per-request overhead) isn't all that bad, at least behind a reasonably stable caching CDN. YMMV, of course, and I'd definitely recommend enabling that with caution in any production environment.
I looked at this again and had an idea that I think solves all the points with very little code written against the Django middleware:
class WhiteNoiseMiddleware(WhiteNoise):
add_static_root_files = True
def __init__(self, get_response=None, settings=settings):
...
if self.add_static_root_files and self.static_root:
self.add_files(self.static_root, prefix=self.static_prefix)
...
class LazyWhiteNoiseMiddleware(WhiteNoiseMiddleware):
add_static_root_files = False
def can_lazy_load_url(self, url):
return url.startswith(self.static_prefix)
def process_request(self, request):
response = super().process_request(request)
if not response and not self.autorefresh and self.can_lazy_load_url(request.path_info):
static_file = self.find_file(request.path_info)
if static_file:
self.files[request.path_info] = static_file
response = self.serve(static_file, request)
return response
-- edit --
I've started an implementation here: https://github.com/evansd/whitenoise/compare/master...thenewguy:lazy
Some simple static serving tests in my local project work. Will continue to tinker. I intend to use the storage manifest to determining the return value from can_lazy_load_url()
so we do not attempt to load arbitrary paths
@evansd Any thoughts on this approach?
-- edit 2 --
I've started PR #275. Awaiting feedback to write tests and docs
Just wanted to leave an update that I've been running with the working concept in PR https://github.com/evansd/whitenoise/pull/275 for a little bit now and it seems to work well
If anyone wants to PR ServeStatic, I would support merging any PR that adds a management command to pre-populate the known files via the Django manifest.
EDIT: Ended up PRing this myself https://github.com/Archmonger/ServeStatic/pull/22
When multiple Django WSGI processes start in parallel accessing thousands of static files via network file system, it can take quite a while to seed the
WhiteNoise.files
static files cache, during which time the Django app can't yet service inbound requests. (For example, we have server instances with 6 Django processes running Whitenoise 4.1.3 accessing ~4,000 static files via EFS (AWS-flavored NFS) which takes roughly 45 seconds to finish scanning the staticfiles dirs and begin accepting requests.)I see the
WHITENOISE_AUTOREFRESH
setting to skip the cache initialization, but that's labeled not for production use, and clearly adds ongoing overhead, as it stats every file at request time.I wonder if there's another configuration option we're missing, if we're using the tools incorrectly, or if you've considered adding another mode, where the
WhiteNoise.files
cache is lazy loaded incrementally as each file is requested?