We'd like to save storage space and extend data retention time by intelligently retaining old GetterRun datasets if they are still in use.
This PR:
Adds a GetterRun.is_in_use() shortcut method to check if a GetterRun instance is in use by any Latest instance.
by default, prevents deleting DataGetter instance/data if it's still in use by a Latest instance.
Adds a --force-delete-in-use-data flag to manage.py delete_datagetter_data to allow deleting data that's still in use anyway (as per current behaviour).
Adds a --older-than-days flag to manage.py delete_datagetter_data, to allow us to delete all not-in-use GetterRuns older than N days.
I've added the --older-than-days flag because I figured just deleting the single oldest GetterRun could cause us to store a lot of unnecessary data, because if the oldest one is in use, it'll not delete it. Then if the oldest is say, a year old, the existing behaviour with --oldest will never delete anything newer than that either, even if e.g. 99% of runs are unused.
So I guess, say we want to retain the last 7 days fully, plus anything older still in use, we could replace the existing --oldest with --older-than-days 7
Questions:
How long should we retain all data for? Keep it at the existing 31-ish days, or go down to say a week?
Do we want to add another command to data_run.sh to force-delete really old data, e.g. manage.py delete_datagetter_data --no-prompt --older-than-days 366 --force-delete-in-use-data to fully get rid of anything older than a year
Related to ODSC support ticket 44487.
Further note:
After discussion we've decided to not retain any not-in-use data, and retain in-use data for 90 days. This PR also adds --all-not-in-use flag and updates the command as run to implement this policy.
We'd like to save storage space and extend data retention time by intelligently retaining old GetterRun datasets if they are still in use.
This PR:
GetterRun.is_in_use()
shortcut method to check if a GetterRun instance is in use by any Latest instance.--force-delete-in-use-data
flag tomanage.py delete_datagetter_data
to allow deleting data that's still in use anyway (as per current behaviour).--older-than-days
flag tomanage.py delete_datagetter_data
, to allow us to delete all not-in-use GetterRuns older than N days.I've added the
--older-than-days
flag because I figured just deleting the single oldest GetterRun could cause us to store a lot of unnecessary data, because if the oldest one is in use, it'll not delete it. Then if the oldest is say, a year old, the existing behaviour with--oldest
will never delete anything newer than that either, even if e.g. 99% of runs are unused.So I guess, say we want to retain the last 7 days fully, plus anything older still in use, we could replace the existing
--oldest
with--older-than-days 7
Questions:
data_run.sh
to force-delete really old data, e.g.manage.py delete_datagetter_data --no-prompt --older-than-days 366 --force-delete-in-use-data
to fully get rid of anything older than a yearRelated to ODSC support ticket 44487.
Further note:
After discussion we've decided to not retain any not-in-use data, and retain in-use data for 90 days. This PR also adds
--all-not-in-use
flag and updates the command as run to implement this policy.