WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
240 stars 194 forks source link

Prevent Django Admin default queries on primary media tables in production #4344

Open AetherUnbound opened 4 months ago

AetherUnbound commented 4 months ago

Description

On several occasions now, we've run into production resource issues with our database due to Django Admin performing some default query against our (massive) primary media tables (e.g. #970).

Recent changes to the reporting admin (specifically #4254) have added new views that are also selecting the primary media tables in order to generate the UI. These are incredibly useful for local testing, but are actively harmful for production. We should alter the logic for these fields so that the automatic completion only occurs locally and does not occur in production. This will help prevent issues from occurring from simply visiting pages in Django Admin in production.

Going forward (per @sarayourfriend's comment https://github.com/WordPress/openverse/issues/4344#issuecomment-2146265927), we should create a custom object manager for the media models that ensures the following things:

This would prevent us from erroneously adding naive queries that could impact production in the future, even if we add the functionality for the query in a naive way.

sarayourfriend commented 4 months ago

Reopening to signify my request for clarification about the status of this issue and the solution implemented in https://github.com/WordPress/openverse/pull/4349#issuecomment-2121456363

AetherUnbound commented 3 months ago

@sarayourfriend can we close this issue, or did you want to keep it open until we can remove the functions that the associated issue created? I think #4386 maybe have also helped accomplish this.

sarayourfriend commented 3 months ago

I want to prevent select * from <media> without a limit. There is literally no use case for it in the API and it is trivial to cause a problem in production by doing that, even without knowing (e.g., Django admin forms).

I think it could be accomplished by creating a custom object manager for the media models that prevents selecting more than the maximum number of rows we need for search.

To me the problem isn't with the implementation of the Django admin forms we have now. The problem is that it's even possible. I want to make it impossible to cause this problem in production, not just something we try to cautiously avoid.

AetherUnbound commented 3 months ago

FWIW I think what's more destructive, even if a limit is provided, is select * from <media> order by <column>, since the order by clause is what causes very heavy operations on the database. Just noting that for the eventual work on this ticket. Agreed that a system-level block for that would be ideal!

sarayourfriend commented 3 months ago

Sounds good! Whatever the features we shouldn't use on those tables, the object manager can block them. Can you update the issue body with what you think those features that need to be blocked are? Is it just order by?