esmf-org / esmf

The Earth System Modeling Framework (ESMF) is a suite of software tools for developing high-performance, multi-component Earth science modeling applications.
https://earthsystemmodeling.org/
Other
161 stars 77 forks source link

Incorrect time delay ESMF_CompTunnelUTest - Mac OS w OpenMPI #295

Closed danrosen25 closed 1 month ago

danrosen25 commented 1 month ago

ESMF_CompTunnelUTest fails intermittently due to the time delay between ESMF_GridCompFinalize and ESMF_GridCompWait for an ESMF_SYNC_NONBLOCKING call being less than 3 seconds. This doesn't make sense as the finalize subroutine calls ESMF_VMWtimeDelay for 3 seconds.

20240910 215612.746 INFO             PET4  FAIL  Check time delay on non-blocking Component D Finalize/wait, ESMF_CompTunnelUTest.F90, line 1283:  Incorrect time delay
20240910 215612.746 INFO             PET4  delayTime (wait) =    2.9923109999999973

https://github.com/esmf-org/esmf/blob/9379ba6a7ea05a870c3c320be556aa76b23df470/src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90#L1255

https://github.com/esmf-org/esmf/blob/9379ba6a7ea05a870c3c320be556aa76b23df470/src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90#L1267

https://github.com/esmf-org/esmf/blob/9379ba6a7ea05a870c3c320be556aa76b23df470/src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90#L1283

Originally posted by @danrosen25 in https://github.com/esmf-org/esmf/issues/294#issuecomment-2342192503

danrosen25 commented 1 month ago

Hi @theurich I opened an issue for this. After looking through all the code I couldn't determine the actual point of failure. It could be an issue with the Clang v15 + gcc v14 + OpenMPI v5 call to MPI_Wtime, which could inconsistently return a wall time. This may be the reason precision exists in the first place. wtimedelay shouldn't even return until wtime@2-wtime@1>=3 seconds so another call to wtime should guarantee that wtime@3 is greater than wtime@2. It's weird that it doesn't.

I don't have a fix for this but I wanted to note the issue and ask for suggestions.

One more thought is that I don't like the busy wait in VMK::wtimedelay. It's burning computational cycles while waiting. The only reason I can think of using this over std::this_thread::sleep_for(std::chrono::milliseconds(int(delay * 1000))); is because you don't want to lose anything in the cached memory?

anntsay commented 1 month ago

time checking is through MPI library and only start seeing this when upgrading the OpenMPI. could be wtime or undderneath wtime Bill: also uses OpenMPI v5.0.0_1 but doesn't see this. Action: Dan is proposing to go back to OpenMPI v4 to see if there is a difference between v4 and v5? Gerhard prefer not to use sleep.

Currently we don't think there is any issue with ESMF for these failure.. possibly because of OpenMPI.

theurich commented 1 month ago

One more thought is that I don't like the busy wait in VMK::wtimedelay. It's burning computational cycles while waiting. The only reason I can think of using this over std::this_thread::sleep_for(std::chrono::milliseconds(int(delay * 1000))); is because you don't want to lose anything in the cached memory?

This call is only used in tests. It is used to simulate busy work on the CPU, so it makes to be a busy wait.

anntsay commented 1 month ago

Bill: discovered that wtime is global attribute. the clock is not guaranteed to be synchronized. Gerhard: it is fine. it is all local.

billsacks commented 1 month ago

One thing I noticed about the failing combo is that it uses the --oversubscribe option, and this test uses 8 mpi procs, which I think is the most used by any test (there are a handful of others that also use 8), so I assume that this is one of the tests that's oversubscribed. I can't think of how that would impact this test, but is it possible that oversubscribing the mpi tasks causes issues with the logic in this test???

danrosen25 commented 1 month ago

@billsacks @theurich Adding --oversubscribe changes the default bind policy to -bind-to none. As far as I can think this shouldn't be an issue because MPI_Wtime should be maintained by an MPI task, even if the OS does a context switch and then switches back. Right?

billsacks commented 1 month ago

Adding --oversubscribe changes the default bind policy to -bind-to none. As far as I can think this shouldn't be an issue because MPI_Wtime should be maintained by an MPI task, even if the OS does a context switch and then switches back. Right?

I don't at all have a full picture of what's going on with this test, so I'm not sure. Your point about changing the bind policy to -bind-to none made me wonder if there could be something funny going on with slight offsets between the clocks on different processors, which could maybe cause some odd behavior given the subtraction of a baseline time. But I can't think of how that would impact the test, and it seems weird that there would be a non-trivial difference in clock times between the different processors anyway.

I was thinking about this while playing around with this test program:

#include <stdio.h>
#include "mpi.h"

int main(int argc, char *argv[]){

   double mytime;

   MPI_Init(NULL,NULL);

   mytime = MPI_Wtime();

   printf("%27.17f\n", mytime);

   while (mytime < 1) {
      mytime = MPI_Wtime();
   }

   printf("%27.17f\n", mytime);

   MPI_Finalize();

   return 0;
}

Actually, my original thought in putting that together was wondering if MPI_Wtime might be giving some huge value on the GitHub runner, such that there is very little precision maintained when subtracting times. That felt like a long shot, though, and at least on my machine MPI_Wtime gives 0 values at the start of the program. But then I noticed that, when I ran it oversubscribed (e.g., with 36 processors on my 10-processor laptop), some tasks gave end times of more than 1.01 seconds. This isn't surprising to me, since there's an extra delay from the OS swapping tasks in and out, but it just made me start to wonder how this oversubscription might be impacting this test. I can't envision how it would lead to a delay time of less than 3 seconds, though: it seems like it would lead to longer-than-expected delay times.

danrosen25 commented 1 month ago

@billsacks My thoughts are along the same lines, the time delay should be greater than 3 seconds. There could truly be a bug in OpenMPI. I read through the MPI_Wtime documentation and the clock should be monotonic except for certain settings for opal timer. And binding isn't recommended for --oversubscribe. https://www.open-mpi.org/doc/v4.1/man3/MPI_Wtime.3.php

danrosen25 commented 1 month ago

@billsacks @theurich I reran every single failed Development Test using OpenMPI v4.1.6 and they each passed. It's possible that MPI_Wtime has a bug in v5.0.5 or v4.1.6 runs slower and therefore passes. Either way I'm closing this issue as resolved. https://github.com/esmf-org/esmf/actions/workflows/development-tests.yml