NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
977 stars 232 forks source link

Fea unified recipe documentation #378

Closed loliverhennigh closed 7 months ago

loliverhennigh commented 7 months ago

Added documentation and updated download scripts to make it easier to get up and running with the unified recipe. Following the instructions should allow a user to get training on almost any machine with a GPU having more than 8 GB or memory.

There is one caveat in this PR. I have configs set up to train on a small chunk of data. This works however I am getting Nans from the AFNO model after a few epochs. I believe this is an issue with the instability in AFNO and the smaller dataset. Following this PR I am getting the SFNO configs in. My plan is to switch the minimal example to SFNO which should be more stable.

loliverhennigh commented 7 months ago

/blossom-ci

loliverhennigh commented 7 months ago

/blossom-ci

loliverhennigh commented 7 months ago

/blossom-ci

loliverhennigh commented 7 months ago

This looks great to me! Very lucid explanations. Added a few minor comments. Also, would be good to make an entry to master README about this here: https://github.com/NVIDIA/modulus/tree/main/examples#weather for higher visibility.

This is a great idea. I will do this in the SFNO config PR though as that should mark when we are ready to have people use it.

loliverhennigh commented 7 months ago

/blossom-ci

loliverhennigh commented 7 months ago

/blossom-ci