akolpakov / django-unused-media

Remove unused media files from Django project
MIT License
124 stars 24 forks source link

Support ignoring recent files #36

Closed tari closed 4 years ago

tari commented 4 years ago

There's a potential race between an application writing a new file and updating the database, such that a media file that has not yet been recorded in the database is treated as unused:

  1. Application writes file to media storage
  2. Scan finds file in storage
  3. Scan doesn't find file in database
  4. Application writes model referring to file to the database

The window between 1 and 4 will usually be short, but a very fast scan for unused files or transactionality-related delays in the database could make this more likely to happen.


Supporting an option to ignore files that were created within some time period can make it arbitrarily difficult to cause this problem. For instance if any file that's less than one day old is ignored, it's virtually impossible for this to happen because it's difficult to imagine a reasonable application taking a day to complete a database write. However, making this a user-accessible option would allow users to decide an appropriate interval for themselves in making a tradeoff between prompt pruning of unused files and potential data loss.

I'm unsure how feasible this is to do in a portable way, though- I haven't looked into it in any detail.

tari commented 4 years ago

This seems similar to #8, but while that issue notes a potential issue with an ordering like (2, 1, 3, 4) that's mitigated by taking care to scan the filesystem first it does not handle this plausible (1, 2, 3, 4) ordering.

akolpakov commented 4 years ago

Thats true. 1,2,3,4 can be possible.

Actually we can even add default delay for example 1 min. It will dramatically minimise such case and does not affect the purpose of the utility

tari commented 4 years ago

Yeah, a 1-minute delay by default seems reasonable. I do think an option (command line flag) to specify a custom minimum age for deletion is still good to have though, since some applications may be designed such that longer delays between writing a file and committing objects are likely.

akolpakov commented 4 years ago

@tari , #38 - what do you think?