Closed vivekiyer closed 3 years ago
I don't think there are any monotonicity guarantees for any of the stochastic gradient algorithms that we use. In this case my assumption for the single-GPU U-Net is that this happened due to random chance. I doubt that you would see monotonicity with any GPU arrangement for the VarNet model, which uses SSIM.
Thanks for the response and the comment. Appreciate it.
We noticed that if we run training across multiple GPUs (with ddp enabled) the training loss that is printed seems to be incorrect, and does not decrease monotonically with each epoch. The same model when run on a single GPU shows monotonically decreasing loss. I have attached sample losses from a multiple GPU run and a single GPU run below. Any suggestions on where we should look to fix this?
results_multiplegpus.txt results_singlegpu.txt