Open Yann-CV opened 5 months ago
To resolve the issue where the model checkpoint for your VAE is always saved with an accuracy of 0.00 in the filename, you need to ensure that the metric you are monitoring (in this case, "accuracy") is being logged correctly during training. However, in the provided code, the monitored metric is "training_loss" and not "accuracy".
To fix this, you should either change the filename to reflect the monitored metric "training_loss" or ensure that "accuracy" is being logged and monitored correctly. Here is how you can adjust the filename to use "training_loss":
CHECKPOINT_CALLBACK = ModelCheckpoint(
save_top_k=1,
monitor="training_loss",
mode="min",
dirpath="/storage/ml/models",
filename="vae-mnist-{epoch:02d}-{training_loss:.2f}",
)
This change will ensure that the filename reflects the "training_loss" metric, which is being monitored [1].
To continue the conversation, mention @dosu.
For VAE, the filename of the model checkpoint is containing accuracy which is always 0.00 vae-mnist-epoch=30-accuracy=0.00.ckpt