cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

Files are being saved to disc without the user knowing it #63

Closed israelmcmc closed 8 months ago

israelmcmc commented 1 year ago

Files should only saves to disc when the user either:

  1. Explicitly saves them with .write()
  2. The module ask for an explicit working path. In this case, preferably use a temporary folder by default.
israelmcmc commented 1 year ago

e.g. Data IO and imaging

ckarwin commented 9 months ago

It sometimes makes sense for the default behavior to write the output to file, i.e. every step of a standard fermi analysis is saved to file. You prefer for the default behavior to not write?

israelmcmc commented 9 months ago

I agree it makes sense for an analysis pipeline to save intermediate step to disc. By pipeline I mean a sequence of steps that take you from that data to the analysis results --reading the data, setting up the model, fitting, plots, checks, etc. Such a pipeline typically has as input a path where to place intermediate or ancillary files, which the user can change, either set by a command-line input or in a config file (like in the case of fermi).

What I meant on this issue is something different. Some of the basic classes in cosipy library (data IO and imaging, I think), which are going to be the building blocks of what will be the pipeline, are storing the files to disc without the user specifying it and using hardcoded names and paths. This will take away the flexibility from those building the pipeline and analysis scripts from organizing the files to make the most sense as a whole, and of deciding which file make sense to store and which do not, depending on the goals of the script and computer resources.

I opened this issue based on a couple of things I saw as problematic when I run the mini DC:

So to summarize, yes, I do prefer if the basic classes had well defined I/O methods, and only create files if the users specifies it. But also yes, I agree that a pipeline (or something equivalent to GTAnalysis) can store output to files using the I/O methods from the basic classes.

hiyoneda commented 9 months ago

In the current image deconvolution, there are two cases when many files are saved. One is the intermediate files of the reconstructed images (image, delta image, likelihood etc). I will add a parameter to determine whether they are saved soon. The second is when the coordinate conversion matrix is calculated. In this step, the dwell time map is calculated for each sky pixel. Actually, in the current implementation, the function for the dwell time map calculation also saves intermediate files automatically. So the same number of the files as the sky pixels are automatically saved. @saurabhmittal23 @Yong2Sheng can you also add a parameter to select whether intermediate files are saved in "get_dwell_map"?