FYI: I worked around it now by adding the following to my LightningModule
. A bit brittle, but seems to work for now.
def setup(self, stage: str):
def _save_config(self):
# P: Currently, we do not log all hyperparameters:
# lightning.CLI saves the config.yaml only locally, not well to W&B.
# self.save_hyperparameters() does not work well with YAML, see
# S: Read saved local config and save it as hparams.
# You should disable any other `self.save_hyperparameters()`.
# NOTE: This logs the RESOLVED config using YAML and CLI arguments.
import torch.distributed as dist
if dist.is_initialized() and dist.get_rank() != 0:
# only save config with rank0
if self.trainer.fast_dev_run:
# in fast_dev_run mode, loggers are replaced by DummyLogger
config_yaml_path = Path(self.logger.save_dir) / "config.yaml"
assert config_yaml_path.exists()
with open(config_yaml_path) as f:
dct = yaml.safe_load(f)
print("=== Saved config.yaml as hyperparameter")
cc @mauvilsa @awaelchli
I was not aware of this merging of parameters. I guess this happens always when both model and data do save_hyperparameters
. This should be easy to fix by excluding the special keys _class_path
and _instantiator
. Though, maybe better to exclude all keys starting with _
since parameter names should not start with this, and thus "Force User Decisions To Best Practices".
Ran into same issue.
Simple temporary workaround:
In either LightningModule
or LightningDataModule
similar issues: #9492
I created pull request #20221 to fix this.
Bug description
The minimal example below throws the error
RuntimeError: Error while merging hparams: the keys ['_class_path'] are present in both the LightningModule's and LightningDataModule's hparams but have different values.
I though this was supposed to work. Would really appreciate workaround tips (that also work with checkpointing) or a fix.
What version are you seeing the problem on?
How to reproduce the bug
Run the code below using e.g.
python --config config.yaml
Error messages and logs
Current environment
* CUDA: - GPU: - NVIDIA A10G - available: True - version: 12.1 * Lightning: - lightning: 2.4.0 - lightning-utilities: 0.11.4 - pytorch-lightning: 2.3.3 - torch: 2.3.1 - torchdata: 0.7.1 - torchmetrics: 1.0.3 - torchsummary: 1.5.1 - torchvision: 0.18.1 * System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.11.9 - release: 5.15.0-1066-aws - version: #72~20.04.1-Ubuntu SMP Thu Jul 18 10:41:27 UTC 2024
Relevant existing issues