MannLabs / SPARCSpy

9 stars 6 forks source link

write function to cleanup "excess" data #11

Closed sophiamaedler closed 2 years ago

sophiamaedler commented 2 years ago

Once finished with processing while it is desirable to keep the final single-cell datasets there is a lot of information that is written out which is no longer required in permanent storage (if needed it could always be recalculated) that takes up a lot of storage space and keeps accumulating. An idea to deal with this would be to write a cleanup function for project directories that gets rid of all intermediate results and results that are no longer relevant (i.e. input images, individual tiles segmentations, etc.) to reduce storage memory space. We should be able to save over 50% of storage space just by removing input_image.h5 which we have saved in another location anyways. Thoughts?

GeorgWa commented 2 years ago

Yes, we should definetly adopt some procedures like this. If you are interested I could inlcude the option to delete the input_image.h5 and all of the shard folders after segmentation, what do you think?

I think we can also delete a lot of processed data which are not single cell hdf5 training files or microscopy images. Everything in between, especially older files, are not needed anymore and can be reproduced, if needed.

sophiamaedler commented 2 years ago

I was thinking to add a function like your viper-stats which you can easily run from the command Line terminal. I personally would separate it from the actually running of the segmentation that we can then specifically fall back on.

I was working on a quick little script but still have a small issue. I should hopefully be able to push it soon.