broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Overwrite existing files True/False #24

Closed ErinWeisbart closed 4 years ago

ErinWeisbart commented 4 years ago

It would be nice (though not necessary) to add in an additional setting that toggles True/False for overwriting existing files. Default behavior in the .yaml should be set to true. But if, for example, you got most of the way through a module before it errored or got shut down, or a single site was missing its input file and so failed to run, it would be nice not to have to re-process everything.

gwaybio commented 4 years ago

these issues have been very helpful as I'm marching towards version 0.1 release. I am working on this issue now.

I have one quick question @ErinWeisbart - what is your rationale for default being force_overwrite: true? My thinking is that the default should be false. My rationale here is as you say:

you got most of the way through a module before it errored or got shut down, or a single site was missing its input file and so failed to run, it would be nice not to have to re-process everything.

but we don't want the next run to overwrite everything already executed, correct? It would be good to confirm this logic with you, but this is not urgent, as something like this can be easily changed later.

ErinWeisbart commented 4 years ago

I was thinking true as it is the least likely to produce errors. If someone changes any config settings in between running parts of the data then if overwrite is true all of the data will be processed similarly. If false and they don't realize they've changed config settings then they'll end up with two halves processed dissimilarly which may lead to a downstream mess. Choosing not to overwrite existing files is removing a safety mechanism so it shouldn't be default.

gwaybio commented 4 years ago

💯 YES - ok, I agree totally. Stability > Efficiency for this codebase and our users. Thanks!

gwaybio commented 4 years ago

I am thinking about how a user might interact with force. Very soon, there are two ways to ways to invoke "force" (i.e. overwriting files):

A couple of decisions that need to be made:

  1. Which one takes priority?
    • Currently, I am thinking that the command line should take priority since it is harder to do unintentionally.
  2. Should we include a flag (or config option) to not overwrite
ErinWeisbart commented 4 years ago

Can you explain the rationale behind using any command line actions? Shouldn't we have everything in the config file? If I'm the target user, I wouldn't even consider that I might need/want to use command line flags as it seems the whole point of having such a thorough config file is that everything is tunable in a clear, easy manner.

If we do include both, I agree that command line should take priority. But my gut says don't use any command line flags.

gwaybio commented 4 years ago

Right now the only thing that is not in the config file is the name of the config file itself. The way that the recipe steps get this information is through a command line argument. For example:

preprocess_config="config/0.preprocess_sites_config.yaml"
python pooled-cell-painting-recipe/0.preprocess-sites/0.prefilter-features.py --config_file $preprocess_config

this is how each of the recipe steps are invoked in the weld. I can imagine a user having a set of different configuration files lying around that they can slot in by editing the weld.sh file. The alternative is to hardcode the config yaml somewhere else, but I prefer to hardcode it directly in sight of the user.

I am also viewing any additional flags as easter egg overrides. Meaning that a user can be completely blind to them, but an advanced user (one who understands the limitations of this action and the danger of loosening the weld) wishing to interact with the code more directly outside of the weld can.

gwaybio commented 4 years ago

addressed in #34