Beginner ML-question here - but on my train run metrics/validation phase seem to fail at the end of the epoch. I've reduced the training sources to "vocals" and "other".
File "/home/arijr/miniconda3/envs/scnet/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 382, in _save_topk_checkpoint
raise MisconfigurationException(m)
lightning.fabric.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val/usdr') could not find the monitored key in the returned metrics: ['lr-Adam', 'train/loss', 'grad_2.0_norm_total', 'val/loss', 'usdr_other', 'usdr_vocals', 'usdr', 'epoch', 'step']. HINT: Did you call log('val/usdr', value) in the LightningModule?
It was actually a bug, 'val' was not appended to logged metrics, so at the end of the epoch ModelCheckpoint could not find it. I fixed that, and now it should work!
Beginner ML-question here - but on my train run metrics/validation phase seem to fail at the end of the epoch. I've reduced the training sources to "vocals" and "other".
File "/home/arijr/miniconda3/envs/scnet/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 382, in _save_topk_checkpoint raise MisconfigurationException(m) lightning.fabric.utilities.exceptions.MisconfigurationException:
ModelCheckpoint(monitor='val/usdr')
could not find the monitored key in the returned metrics: ['lr-Adam', 'train/loss', 'grad_2.0_norm_total', 'val/loss', 'usdr_other', 'usdr_vocals', 'usdr', 'epoch', 'step']. HINT: Did you calllog('val/usdr', value)
in theLightningModule
?