Closed EiffL closed 3 years ago
Myriam just told me that the TF 2.4.1 module with NCCL 2.8 and CUDA 10.2 should be ready by the beginning of next week (a compilation takes 5 hours :-( I’ll go through the installation steps that you’ve described François on Monday. CUDA 11 should be available on JZ by the end of the next week. If there is enough time before Day 1, I’ll go through the installation steps again with CUDA 11. We’ll probably use srun and not horovodrun. But that’s not a big deal to modifiy.
Have a nice week-end!
Thank you @kimchitsigai! I got a notification from Myriam that the NCCL environmen had finished cooking. I just tried to compile against it and it seems to work nicely :-) I've slightly udpated the instructions in the GETTING_STARTED.
I think the getting started materials are looking good. Now we just need to figure out how to get profiling ingormation correctly, and we'll cover this in #2
I am starting to add scripts and examples, and document the procedure to get setup on Jean-Zay in this fille https://github.com/DifferentiableUniverseInitiative/IDRIS-hackathon/blob/main/GETTING_STARTED.md
@kimchitsigai feel free to add/make suggestions if you see things that would be useful to document here to help people get started on the machine and/or with horovod