fix documentation of --output

wsnoble commented 1 year ago

In the command line documentation, the --output option is only for the sequencing phase and doesn't reflect what outputs are produced during training.

│    --output               -o  FILE                    The mzTab file to      │
│                                                       which results will be  │
│                                                       written.               │

wsnoble commented 1 year ago

Actually, this documentation is also inaccurate. If you use -o foo, then the file foo.mztab gets produced. So it's really asking for the root of the output filename.

I think that when we output checkpoint files, they should also include this root (if it's provided) as a prefix.

bittremieux commented 11 months ago

I think that when we output checkpoint files, they should also include this root (if it's provided) as a prefix.

I'm not sure about this. There's a separate argument to specify the checkpoint file name during training, namely --model (as well as to load an existing model file). How would you resolve these two values then?

wsnoble commented 10 months ago

Are you sure that this is the behavior? I think I usually just see output ckpt files with names like epoch=2-step=150000.ckpt.

In general, it seems like a bad idea to use the same option to specify both the input and the output filename.

I suggest that if the user provides --root foo then the above would be foo.epoch=2-step=150000.ckpt'. Currently, if the user specifiesfoo.ckpt`, how do the different output files get named? It seems like you'd have to have some logic to look for and strip off the ".ckpt" and then add in the epoch and step number to the name. Plus error handling if the user doesn't provide ".ckpt" at the end of the model name.

bittremieux commented 10 months ago

Good catch. The checkpoint files are actually still another factor. They're saved to the directory model_save_folder_path in the config, but their file name can't be specified currently, it's just the standard epoch and step counters.

So:

output specifies the root file name for the mzTab (depending on mode) and log (always) files.
model specifies the model used for inference, or, when training, the state from which to continue training.
During training, models are saved as checkpoints (default: 5 best). Only the directory for checkpoints can be specified, not the file name.

So what do we want to happen?

The instructions for output should clarify that this is the root file name for the mzTab and log files. Done in #276.
Current model behavior seems to be correct for inference.
During training, what do we want model to signify? The output name of the newly obtained model weights? What if training resumes? How do we differentiate the initial model from the newly trained model?

Did I get this right?

wsnoble commented 10 months ago

I would say that root should specify the root of the file name for all output files, including checkpoint files.

Good point about model being overloaded for input and output models. If I were doing this from scratch, I would probably use two distinct options, like model-in and model-out. But for backward compatibility maybe we just use model for input and use the root option to specify the output model name.

wsnoble commented 4 months ago

Note that there is relevant discussion here.

Lilferrit commented 4 months ago

I added a --root_ckpt_name option to the train sub-command that will add a root name to the checkpoint files, e.g. if the option is set as --root_ckpt_name foobar than the checkpoint filenames will have the format foobar.epoch=2-step=150000.ckpt. Otherwise the checkpoint files will have the original default format. The checkpoint files will still be saved to the model_save_folder_path set in the config file.

Lilferrit commented 4 months ago

I also just added an --overwrite_output boolean flag that is false by default. If this flag isn't set then the Casanovo CLI will raise an error if one of the output files already exists, otherwise it will overwrite it. All of these changes are on the model-out branch.

Noble-Lab / casanovo

fix documentation of --output #241