Open kostrzewa opened 1 year ago
@urbach Do you perhaps remember why rank reordering was allowed at the time? (almost 18 years ago :) )
It seems that the HPE engineers were able to find the culprit for our problems on LUMI-G and I think it might be as simple as switching this to
0
.https://github.com/etmc/tmLQCD/blob/443a08ff341590d8c3509a4ed4e06330418f71fa/mpi_init.c#L216
no, I don't remember this anymore. Could be that I introduced this for domain decomposition. I'd say let's try with setting this to '0'!
Thanks. Yes, we'll have to do a number of test runs on various machines to make sure that it doesn't break anything elsewhere...
I have a suspicion that it might have been relevant on the torus networks on the BG/L and /P and in particular later on for the /Q (we never changed it since 2005 though). I don't know what kind of network the IBM p690 at JSC was configured with. Maybe it was relevant there already?
Thanks. Yes, we'll have to do a number of test runs on various machines to make sure that it doesn't break anything elsewhere...
yes, agree!
I have a suspicion that it might have been relevant on the torus networks on the BG/L and /P and in particular later on for the /Q (we never changed it since 2005 though). I don't know what kind of network the IBM p690 at JSC was configured with. Maybe it was relevant there already?
no, I never programmed for the network of the p690 directly. Blue Gene might be...
Tests
@urbach Do you perhaps remember why rank reordering was allowed at the time? (almost 18 years ago :) )
It seems that the HPE engineers were able to find the culprit for our problems on LUMI-G and I think it might be as simple as switching this to
0
.https://github.com/etmc/tmLQCD/blob/443a08ff341590d8c3509a4ed4e06330418f71fa/mpi_init.c#L216