Open OliviaLynn opened 1 year ago
For mid-run clean up, is deleting files during a run somewhat in conflict with the philosophy if ceci? I think one of the main ideas of ceci is to define inputs and outputs for each stage, and other stages expect that outputs for stages exist, and if things are interrupted (e.g. perlmutter crashes on day 3 of a big 10 day run of a pipeline with many stages), ceci can figure out which stages have already run and can pick up mid-stream to complete. If we delete intermediate files before everything finishes, that may no longer work.
I could be misremembering, maybe @joezuntz can say if my interpretation is correct?
@sschmidt23 yes, that's right. When you launch ceci with the resume
flag then it looks for missing files and uses that to decide what needs re-running. Having said that if that's not the behaviour that is useful to you we could add options to customize it - I was thinking about this anyway, to deal with the case where you don't want to overwrite existing files. We could add an option to avoid re-generating intermediate files if their descendants all exist.
A few options within the existing framwork though, in case useful:
This probably makes more sense as a commissioning priority, not necessarily as a release v1.0 priority.
Moving this out of 1.0 as per previous comment above and lack of objections
Had a talk with Alex about the clean-up process for files generated when running a pipeline.
Storing some personal notes here while I work on this issue:
Existing clean up scripts
Mid-run clean up