Open jakubMitura14 opened 1 month ago
You need to start julia with mpiexec
https://github.com/LuxDL/Lux.jl/tree/main/examples/ImageNet#distributed-data-parallel-training
For now it still do not work but I need to dig up deeper into mpi first. Thanks for guidance!
Hello I have 2 GPU as shown by nvidia smi
Then I try
and local_rank evaluates to 0 ; total_workers evaluate to 1. Seems to be incorrect, if I understand idea well.
I use Lux v1.1.0 CUDA v5.5.2 Julia 1.10