NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
835 stars 190 forks source link

Add online validation loss with wandb #413

Closed yairchn closed 3 months ago

yairchn commented 3 months ago

Modulus Pull Request

Description

This PR adds a validation loss computation to training loop. The new loss is recorded both in the report stats and to wandb. While working on this PR I oped an issue for wandb loss saving #412 which this PR handles as well.

Also closes https://github.com/NVIDIA/modulus/issues/416

Checklist

Dependencies

mnabian commented 3 months ago

/blossom-ci

yairchn commented 3 months ago

I am testing this PR in a small training in an interactive session to this WANDB project

Screenshot 2024-04-08 at 4 31 41 PM
mnabian commented 3 months ago

Reviewed the most recent changes. LGTM

mnabian commented 3 months ago

/blossom-ci

mnabian commented 3 months ago

@yairchn black test failed

jleinonen commented 3 months ago

LGTM - left some optional improvements but these can be deferred to later PRs too.

mnabian commented 3 months ago

/blossom-ci

mnabian commented 3 months ago

/blossom-ci

mnabian commented 3 months ago

/blossom-ci

mnabian commented 3 months ago

/blossom-ci