c3-time-domain / SeeChange

A time-domain data reduction pipeline (e.g., for handling images->lightcurves) for surveys like DECam and LS4
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Handling file for un-committed objects #232

Closed guynir42 closed 1 month ago

guynir42 commented 3 months ago

Sometimes we save files to local storage but don't save them to the DB.

An example would be the aligned images. They may potentially be saved to disk but we don't need to keep them around after the pipeline or tests are done.

I suggest adding a __del__ method that would clean up any FileOnDiskMixin that doesn't have a primary key ID.

This doesn't cover all bases but at least would take out most of the orphan files, with minimal risk of accidentally deleting a file that should be saved.

This should go after both local storage and archive.

guynir42 commented 3 months ago

addressed by #233

guynir42 commented 3 months ago

So using __del__ doesn't work because we have lots of cases where there are multiple objects referring to the same file, for instance when we merge an Image into the session, the old object gets deleted and a new one is made in the session.

guynir42 commented 2 months ago

I'm leaning towards: not cleaning up any local files. We'd just have to decide that local storage is temporary storage by design and let the maintainers of each SeeChange instance periodically wipe that storage (e.g., when a pod dies).

If something is saved to the database and to the archive it should be safe from deletion.

This could end up with somewhat bloated HDD space for people running archive and local storage on the same machine but that's not a big issue, I think.