LSSTDESC / rail

Top level "umbrella" package for RAIL
MIT License
9 stars 3 forks source link

Standardize clean up process for files generated when running a pipeline #23

Open OliviaLynn opened 1 year ago

OliviaLynn commented 1 year ago

Had a talk with Alex about the clean-up process for files generated when running a pipeline.

Storing some personal notes here while I work on this issue:

Existing clean up scripts

Mid-run clean up

sschmidt23 commented 1 year ago

For mid-run clean up, is deleting files during a run somewhat in conflict with the philosophy if ceci? I think one of the main ideas of ceci is to define inputs and outputs for each stage, and other stages expect that outputs for stages exist, and if things are interrupted (e.g. perlmutter crashes on day 3 of a big 10 day run of a pipeline with many stages), ceci can figure out which stages have already run and can pick up mid-stream to complete. If we delete intermediate files before everything finishes, that may no longer work.

I could be misremembering, maybe @joezuntz can say if my interpretation is correct?

joezuntz commented 1 year ago

@sschmidt23 yes, that's right. When you launch ceci with the resume flag then it looks for missing files and uses that to decide what needs re-running. Having said that if that's not the behaviour that is useful to you we could add options to customize it - I was thinking about this anyway, to deal with the case where you don't want to overwrite existing files. We could add an option to avoid re-generating intermediate files if their descendants all exist.

A few options within the existing framwork though, in case useful:

drewoldag commented 1 year ago

This probably makes more sense as a commissioning priority, not necessarily as a release v1.0 priority.

OliviaLynn commented 1 year ago

Moving this out of 1.0 as per previous comment above and lack of objections