evansd / whitenoise

Radically simplified static file serving for Python web apps
https://whitenoise.readthedocs.io
MIT License
2.51k stars 148 forks source link

whitenoise's scantree slows down test suite performance #558

Open codingjoe opened 7 months ago

codingjoe commented 7 months ago

Python Version

3.12

Django Version

4.2

Package Version

5.3.0

Description

Hi there 👋 and happy 2024,

We are switching to service ESM, rather than bundling our dependencies. Mainly, because updating once dependency will invalidate the cache for a whole bundle.

While doing so, I noticed that our test suite drastically decreased in performance. The reason, whitenoise will scan the entire STATIC_ROOT directory on every single client request causing significant disk IO.

That's bearable on a MacBook Pro with 1 GB/s disk speed and a large disk cache. On an Ubuntu worker on GitHub actions, we saw a 10x performance decrease. Taking our 8 min. test suite to a crisp 80 minutes.

I would propose moving the files dictionary from the middleware instance to the module or the thread.

The current implementation does not affect regular request service, since Middleware instances are not recycled. However, they are when you use a test client.

Until this is addressed, the only way I see to mitigate this behavior is to remove the middleware in your test suite:

# settings.py
TEST = os.getenv('TEST')
if TEST:
    # https://github.com/evansd/whitenoise/issues/558
    MIDDLEWARE.remove("whitenoise.middleware.WhiteNoiseMiddleware")

Love your thoughts!

Cheers, Joe

merwok commented 7 months ago

In my experience, it’s usual to not add whitenoise to middleware for unit test settings.

codingjoe commented 7 months ago

In my experience, it’s usual to not add whitenoise to middleware for unit test settings.

Interesting, in that case, I little documentation would go a long way here.

adamchainz commented 7 months ago

There’s an FAQ already on avoiding the full scan by enabling WHITENOISE_AUTOREFRESH: WhiteNoise makes my tests run slow!.

Any PRs to surface this information better are very welcome. But since this isn’t a new issue, I’m going to close it.

adamchainz commented 7 months ago

In my experience, it’s usual to not add whitenoise to middleware for unit test settings.

I would not recommend this, the more drift you introduce between test and production environments, the greater the risk for uncaught errors in that gap.

merwok commented 7 months ago

That’s not how I read / apply that principle 🙂

Unit tests are for testing my code, as fast as possible.

Test / prod environments are deployed servers. I validate waitress, whitenoise, dns, email sending, etc. thanks to this test environment, which is very close to prod.

codingjoe commented 7 months ago

Hi @adamchainz, thanks for the prompt response. I am a bit puzzled, though. The docs day the feature defaults to the DEBUG setting. Which is False in a Django test suite default and ours too. Soooo, things might not be as straightforward as they seem. As I understand the code, the auto-refresh will only bypass the instance cache. However, since there is a new instance per request, this isn't really making a dent in a test scenario. However, I only spent one day on profiling and debugging, you are the expert on this code base. Cheers! Joe

adamchainz commented 6 months ago

I’m not a huge expert, I just pattern-matched the problem to the one I’ve seen before. New instances per test does sound like it will cause slowness, yes.

Yes, Django forces DEBUG to False during tests. The docs are trying to say that, by default, tests will have WHITENOISE_AUTOREFRESH set to False when it should really be True.

If you have any ideas for improvements, please do open a PR. Perhaps we should reopen this issue too...

codingjoe commented 6 months ago

Hi @adamchainz,

I took a deep dive into the middleware life cycles in Django's test suite and pytest-django. I can confirm, each client will instantiate a new middleware. However, not on each call. See also: https://github.com/django/django/blob/bc8471f0aac8f0c215b9471b594d159783bac19b/django/test/client.py#L169-L173

I opened a pull-request #562 with a test, that demonstrate the issue. The issue is resolved when the test doesn't fail.

So, yes, @adamchainz, I think it's best to reopen the issue. I will see if I can't find the time to cook up a patch.

Cheers! Joe

adamchainz commented 6 months ago

Great, thanks