Open pmarguinaud opened 2 months ago
Hi Philippe
Yes, this is expected. It is rather unlikely that we can have reproducible results with different NPROC due to the batched FFTs and especially batched GEMMs. The GEMMs run on multiple layers at once, so it depends on the exact number of layers per rank.
What is the use-case here? Is this a production requirement, or a debugging requirement? Depending on this, I would recommend
Any thoughts?
Hello Lukas,
Thank you for these explanations.
Currently we regulary control the reproducibility of our models (ARPEGE & AROME); and it proves quite useful when we need to debug the model, as we can reduce the number of nodes and still reproduce a problem.
It is also something we demand when writing specifications for buying a new machine.
Apparently, everything in ARPEGE but the spectral transforms is reproducible when the number of MPI tasks changes.
But I am not alone to decide on these matters, so I will talk about this with other Météo-France colleagues.
I would also be curious to hear ECWMF opinion on this matter.
As Lukas mentioned, this has been the case for some time due to the batched maths. My thoughts on this :
Two more points that should help down the line for this
Apparently changing NPROC changes numerical results when running on NVIDIA accelerators.
Is this expected ? If so, is it investigated ?
I can provide a small test case if necessary.