Training Collapse - Githubissues

I am trying to reproduce the training process from scratch on Voxpopuli-en. I preserved all your original hyperparameters but found that watermark-related losses stayed the same even after 30 epoches. Conversely, the g_loss went down quickly to negative. Screenshot 2024-10-14 094216 I noticed that since the watermark is added directly on the original waveform, there is a shortcut from the original audio to the watermarked audio. Therefore, the model will try to ignore the watermark and overfit g_loss if the gradient of watermark-related losses is considerably small. So I tried to:

Set weight of wm_detection and wm_mb to 200 and tf_loudnessratio to 2000 (200×). Watermark loss went down in 10 epoches whereas the adversarial loss went up afterwards, introducing significant noise.
Turning off the balancer (which significantly scales loss≈0 up). No difference.
Rescale the gradient of g_loss so that r=norm(g_loss)/norm(wm_loss) is fixed as a hyperparameter. However I found that this makes no difference until r<0.1 where g_loss exploded.

Settings:

I ran dora run solver=watermark/robustness dset=audio/voxpopuli on a single 48GB GPU. Since I do not have access to any Slurm clusters, running dora hyperparameter search may not be feasible. All hyperparameters follows config/solver/watermark/default.yaml except those mentioned above.

Any of your insights or suggestions on this problem would be appreciated.

facebookresearch / audioseal

Training Collapse #55