NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
947 stars 222 forks source link

🐛[BUG]: accumulated loss should be divided by accumulation steps to get the mean loss for wandb report #412

Open yairchn opened 6 months ago

yairchn commented 6 months ago

Version

0.5.0

On which installation method(s) does this occur?

Pip

Describe the issue

In the training loop, an accumulated loss is computing in an additive manner here across all steps in the num_accumulation_rounds. This lose should be divided by the value of num_accumulation_rounds to get the mean rather than the sum of loss to be recorded to wandb: loss_accum += loss/num_accumulation_rounds

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response