We need the slices from hparams.yaml to initialize the model, but are actually relying on PyTorch Lightning's save_hyperparameters to save them to the directory above the checkpoint. This is the case when saving the checkpoint in the logger directory. However if there is no logger, or the checkpoint is saved elsewhere, the slices separated from the checkpoint.
We need the slices from
hparams.yaml
to initialize the model, but are actually relying on PyTorch Lightning'ssave_hyperparameters
to save them to the directory above the checkpoint. This is the case when saving the checkpoint in the logger directory. However if there is no logger, or the checkpoint is saved elsewhere, the slices separated from the checkpoint.https://github.com/AI4OPT/ML4OPF/blob/0dd79a5b7ba7e1826bd70d110969c0a986455b93/ml4opf/models/basic_nn/basic_nn.py#L164-L171
Workaround is to log and checkpoint in the same directory, like
tests/test_models.py
: https://github.com/AI4OPT/ML4OPF/blob/0dd79a5b7ba7e1826bd70d110969c0a986455b93/tests/test_models.py#L92-L98That is, save checkpoints to
model.trainer.logger.log_dir / <directory>
.Temporary fix can save the path to the logdir in config.json. Better fix would be to (re-)save the yaml file in the checkpoint directory.