Closed danrosen25 closed 1 month ago
Hi @theurich
I opened an issue for this. After looking through all the code I couldn't determine the actual point of failure. It could be an issue with the Clang v15 + gcc v14 + OpenMPI v5 call to MPI_Wtime, which could inconsistently return a wall time. This may be the reason precision exists in the first place. wtimedelay shouldn't even return until wtime@2-wtime@1>=3 seconds
so another call to wtime should guarantee that wtime@3 is greater than wtime@2. It's weird that it doesn't.
I don't have a fix for this but I wanted to note the issue and ask for suggestions.
One more thought is that I don't like the busy wait in VMK::wtimedelay
. It's burning computational cycles while waiting. The only reason I can think of using this over std::this_thread::sleep_for(std::chrono::milliseconds(int(delay * 1000)));
is because you don't want to lose anything in the cached memory?
time checking is through MPI library and only start seeing this when upgrading the OpenMPI. could be wtime or undderneath wtime Bill: also uses OpenMPI v5.0.0_1 but doesn't see this. Action: Dan is proposing to go back to OpenMPI v4 to see if there is a difference between v4 and v5? Gerhard prefer not to use sleep.
Currently we don't think there is any issue with ESMF for these failure.. possibly because of OpenMPI.
One more thought is that I don't like the busy wait in
VMK::wtimedelay
. It's burning computational cycles while waiting. The only reason I can think of using this overstd::this_thread::sleep_for(std::chrono::milliseconds(int(delay * 1000)));
is because you don't want to lose anything in the cached memory?
This call is only used in tests. It is used to simulate busy work on the CPU, so it makes to be a busy wait.
Bill: discovered that wtime is global attribute. the clock is not guaranteed to be synchronized. Gerhard: it is fine. it is all local.
One thing I noticed about the failing combo is that it uses the --oversubscribe
option, and this test uses 8 mpi procs, which I think is the most used by any test (there are a handful of others that also use 8), so I assume that this is one of the tests that's oversubscribed. I can't think of how that would impact this test, but is it possible that oversubscribing the mpi tasks causes issues with the logic in this test???
@billsacks @theurich
Adding --oversubscribe
changes the default bind policy to -bind-to none
. As far as I can think this shouldn't be an issue because MPI_Wtime should be maintained by an MPI task, even if the OS does a context switch and then switches back. Right?
Adding
--oversubscribe
changes the default bind policy to-bind-to none
. As far as I can think this shouldn't be an issue because MPI_Wtime should be maintained by an MPI task, even if the OS does a context switch and then switches back. Right?
I don't at all have a full picture of what's going on with this test, so I'm not sure. Your point about changing the bind policy to -bind-to none
made me wonder if there could be something funny going on with slight offsets between the clocks on different processors, which could maybe cause some odd behavior given the subtraction of a baseline time. But I can't think of how that would impact the test, and it seems weird that there would be a non-trivial difference in clock times between the different processors anyway.
I was thinking about this while playing around with this test program:
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[]){
double mytime;
MPI_Init(NULL,NULL);
mytime = MPI_Wtime();
printf("%27.17f\n", mytime);
while (mytime < 1) {
mytime = MPI_Wtime();
}
printf("%27.17f\n", mytime);
MPI_Finalize();
return 0;
}
Actually, my original thought in putting that together was wondering if MPI_Wtime
might be giving some huge value on the GitHub runner, such that there is very little precision maintained when subtracting times. That felt like a long shot, though, and at least on my machine MPI_Wtime gives 0 values at the start of the program. But then I noticed that, when I ran it oversubscribed (e.g., with 36 processors on my 10-processor laptop), some tasks gave end times of more than 1.01 seconds. This isn't surprising to me, since there's an extra delay from the OS swapping tasks in and out, but it just made me start to wonder how this oversubscription might be impacting this test. I can't envision how it would lead to a delay time of less than 3 seconds, though: it seems like it would lead to longer-than-expected delay times.
@billsacks My thoughts are along the same lines, the time delay should be greater than 3 seconds. There could truly be a bug in OpenMPI. I read through the MPI_Wtime documentation and the clock should be monotonic except for certain settings for opal timer. And binding isn't recommended for --oversubscribe. https://www.open-mpi.org/doc/v4.1/man3/MPI_Wtime.3.php
@billsacks @theurich I reran every single failed Development Test using OpenMPI v4.1.6 and they each passed. It's possible that MPI_Wtime has a bug in v5.0.5 or v4.1.6 runs slower and therefore passes. Either way I'm closing this issue as resolved. https://github.com/esmf-org/esmf/actions/workflows/development-tests.yml
ESMF_CompTunnelUTest fails intermittently due to the time delay between ESMF_GridCompFinalize and ESMF_GridCompWait for an ESMF_SYNC_NONBLOCKING call being less than 3 seconds. This doesn't make sense as the finalize subroutine calls ESMF_VMWtimeDelay for 3 seconds.
https://github.com/esmf-org/esmf/blob/9379ba6a7ea05a870c3c320be556aa76b23df470/src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90#L1255
https://github.com/esmf-org/esmf/blob/9379ba6a7ea05a870c3c320be556aa76b23df470/src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90#L1267
https://github.com/esmf-org/esmf/blob/9379ba6a7ea05a870c3c320be556aa76b23df470/src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90#L1283
Originally posted by @danrosen25 in https://github.com/esmf-org/esmf/issues/294#issuecomment-2342192503