Open Foxigod opened 3 months ago
Hi @Foxigod . Thanks for pointing this out! Indeed it is true, the ModelCheckpoint
callback which is created is not default, but has different values for filename
and monitor
.
In my view, this is actually the most commonly desired behaviour, so I would argue we should keep the functionality as is, but update the help message for this command if possible.
As for the possibility of there being other similar issues, all modifications to the core of the default LightningCLI
are concentrated in cli_tools.py
. Because of this, after checking said module, there should be no other differences, other than the 2 new arguments, predict_output_dir
and out_dtype
, useful for inference.
Having said that, it is a useful reminder that modifications we make to the LightningCLI
class (in this case subclass), in particular when calling methods such as set_defaults
, may require modifications in the documentation / help messages.
Describe the issue The help printout from terratorch CLI seems to be inherited from the PyTorch Lightning docs but the functionality has been modified in terratorch to be inconsistent with said docs. Running
terratorch fit -h
produces:which specifically states that a default ModelCheckpoint callback will be created if a ModelCheckpoint callback doesn't already exist in the trainer callbacks. In my experience with terratorch this doesn't actually seem to be a "default" ModelCheckpoint callback, rather it seems to monitor a specific quantity and thus doesn't save a new checkpoint when this monitored quantity doesn't improve. Looking at the PyTorch Lightning docs it seems to actually instantiate a default instance of ModelCheckpoint. However, since this is not my experience with terratorch, I have to look into the terratorch code where I find this: Although I don't fully know how all these things interact with one another, this screenshot seems to show a configuration that is consistent with my experience with terratorch, the filename as
{epoch}
and the monitor quantity being set toval/loss
. The filename and monitor parameters also seem to be consistent with the StateDict checkpoint that gets saved.So from my point of view it seems that terratorch "blindly" inherits the documentation from Pytorch Lightning but then modifies the functionality to be inconsistent with said documentation.
To Reproduce (optional, but appreciated)
trainer.enable_checkpointing
option set to True and don't include any other ModelCheckpoint callback in the trainer callbacks. (This even seems to be the case for examples like sen1floods11)Deployment information (optional) I installed terratorch by cloning the repo and running
pip3 install -e <cloned_directory>
. If I rungit rev-parse HEAD
I get the following output:which is this commit.
Notes The issue I am raising here is not just of this specific part of the help printout from terratorch, but of the "integrity" of said printout. There might exist similar issues with other parts that I haven't encountered.