Framework providing pythonic APIs, algorithms and utilities to be used with Modulus core to physics inform model training as well as higher level abstraction for domain experts
The workaround fix adds a time delay between the warmup steps and the start of the graph capture to allow enough time for NCCL watchdog to clean-up work. The fix was stress tested against FPGA example, and it passed 20/20 times.
Modulus Pull Request
Description
Closes #47
The workaround fix adds a time delay between the warmup steps and the start of the graph capture to allow enough time for NCCL watchdog to clean-up work. The fix was stress tested against FPGA example, and it passed 20/20 times.
Checklist
Dependencies