Open DavidHuber-NOAA opened 7 months ago
@DavidHuber-NOAA Thanks a lot for all your efforts on this! What is the branch now to reproduce this issue when the system experts and other netcdf/hdf experts could reproduce the issue and investigate it?
@TingLei-NOAA I will create one, thanks!
@DavidHuber-NOAA Thanks a lot!
An update on digging using Dave's hercules/netcdff_461 on hercules. My current focus is to find any possible issues in the fv3reg GSI IO codes. Up to now, the changes include the fix as Ed proposed, change of "check( nf90_open(filenamein,nf90_write,gfile_loc,comm=mpi_comm_read,info=MPI_INFO_NULL) )" to "check( nf90_open(filenamein,ior(nf90_write,nf90_mpiio),gfile_loc,comm=mpi_comm_read,info=MPI_INFO_NULL) )" and some other changes. Hasn't resolved the issue. New findings found when more mpi process numbers like 20, 130 are used, the job would succeed, which might indicates/confirms the "hdf error" came from some "more intensified" parallell IO actions when less MPI processes were used.
I updated the description and title of this issue as the apparent cause now is not the upgrade of netCDF-Fortran to v4.6.1, but instead the implementation of the I_MPI_EXTRA_FILESYSTEM=1
flag.
@TingLei-NOAA The HDF5 failed tests were mostly false positives. They were largely the result of warning messages being printed into the log files that the HDF5 ctests
then compared against expected logs. The warning messages were all about unused I_MPI*
flags. There were a couple of out-of-memory failures as well, but I don't think this had anything to do with the I_MPI_EXTRA_FILESYSTEM
flag.
Second, no, this is not required on the other systems. I_MPI_EXTRA_FILESYSTEM
is a new flag implemented by Intel that does not exist for versions 18 through 2021.5.x (Hercules is running 2021.9.0). Instead, native filesystem support is automatic and cannot be disabled. Interestingly, this flag used to exist for older versions of Intel (version 15 and earlier).
@DavidHuber-NOAA Thanks a lot! Will you report your findings in the hercules help ticket? I will follow up with some codes details (when the issue always occurred in my 4 mpi process cases) and see if the system administers would have any clues.
Yes, I will do that.
Firstly, great work @DavidHuber-NOAA , this was a lot to figure out.
If there is to be a refactor of the netCDF code, may I suggest that you start with some unit testing, which can then be used to verify correct behavior on new platforms? That is, start by writing unit tests which, when run on any platform, will indicate whether the parallel I/O code is working. This will allow debugging of I/O problems without involving the rest of the code.
I'm happy to help if this route is taken.
Also if a refactor is considered, you may also consider switching to PIO. It's offers a lot of great features for parallel I/O. Using netCDF parallel I/O directly is much more work than letting PIO do the heavy lifting. Let me know if you would like a presentation on PIO and how to use it.
@edwardhartnett Do you have any comments/suggestion on my question in the hercules ticket following @DavidHuber-NOAA 's update on his findings? I attached my question below:
n my 4 mpi process run, when "export I_MPI_EXTRA_FILESYSTEM=1"
it always fails after Line XXX and on Line YYY at a certain loop
do loop
....
call check( nf90_get_var(gfile_loc,ugrd_VarId,work_bu,start=u_startloc,count=u_countloc) ) !XXX
call check( nf90_get_var(gfile_loc,vgrd_VarId,work_bv,start=v_startloc,count=v_countloc) ) !YYY
........
end loop
From Dave's findings, seems the mpi lib would do some optimization for these two lines and cause the hdf error.
```. Thanks.
An update : Now, I have a code which moves out that uv IO outside of the do loop and it seems working with I_MPI_EXTRA_FILESYSTEM=1 , namely, it had succeeded in all 4 runs up to now. This branch has been running successfully with I_MPI_EXTRA_FILESYSTEM=1. I will prepare a clean and verified branch incorporating all recent changes including the dimension change for start and count paramters as @edwardhartnett proposed.
An update: now it is believed with the PR https://github.com/NOAA-EMC/GSI/pull/698 and appropriately tuned parameters in the job script ( to give enough memory to the low level parallel netcdf IO with mpi optimization) . More details:
seems for the current codes, the memory could still play a role in causing this issue or similar issues. First, a relevant update on netcdf output issue on hera. https://github.com/NOAA-EMC/GSI/issues/697 ( see the latest update ). Second, on hercules, I found for hafs_3denvar_hybens_hiproc_updat, when ppn=20, node=2,
the original HDF error popped again. When ppn=10, node=4, The GSI ran smoothly again. So, seems the refactoring of the codes/changes including use of nf90_collective all help avoid some messing up in the low level parallel IO processes with the mpi IO optimization, while memory usages play an important role and need to taken care in addition to the code refactoring.
An summary on what we have got on this issue. This is a investigation by "us" including @DavidHuber-NOAA @edwardhartnett with helps from Peter Johnson through the Hercules help desk and @RussTreadon-NOAA . It's important to note that the insights presented below represent my current perspective on the matter. Feedbacks from collaborators are to refine these findings further, I hope the finalized summary reflecting our collective consensus will be shared subsequently. Summary:
I_MPI_EXTRA_FILESYSTEM is to enable/disable "native support for parallel file systems" . 1)Issue Overview: The problem, identified while enabling native support for parallel I/O, is believed to stem from issues within the low-level NetCDF/HDF parallel I/O operations that interact with this "native support" feature. The recently refactored GSI fv3reg codes (PR #698) have significantly mitigated the frequency of these issues, though they have not entirely eliminated the possibility of their occurrence. 2)Alternative Solutions While alternative approaches, such as utilizing a different MPI library, were considered as potential solutions to this issue, it is decided to revert to the function being disabled for several reasons: The issue might be specific to the Hercules system, suggesting a platform-dependent problem. It is probable that future software updates on Hercules may inherently resolve this issue. Should this issue manifest on other systems, indicating a more generic problem related to the interaction between parallel NetCDF operations and MPI's native support for parallel I/O, a recommended and more comprehensive solution would be to adopt the Parallel I/O (PIO) library, as suggested by @edwardhartnett.
Thank you @TingLei-NOAA for the summary.
One clarification:
I am not an investigator on this issue. My silence should not be interpreted as agreement or disagreement. My silence reflects the fact that I am not actively working on this issue.
Two comments:
@RussTreadon-NOAA Thanks for your clarification. I will update the summary accordingly. For your point 1, I agree and as I described in that version of summary, I plan to wait to see if that actually happen. If that happen, I 'd prefer to use PIO if it is still difficult to sort out what happen in that level of parallel IO. For your point 2, a possible reason is that in global parallel netcdf IO, for each IO function, it is always done for the same variables (like a 3D) while different MPI process access different parts of the same variables. In regional parallel IO, in addition to access different parts of a variable, the GSI fv3reg would also access different variables. The latter is making use of more "capabilities" of the system.
@TingLei-NOAA and @DavidHuber-NOAA : shall we keep this issue open or close it?
We can leave this open. I am working on building the GSI on Hercules with Intel and OpenMPI to provide @TingLei-NOAA with an alternative MPI provider to see if the issue lies in the GSI code or Intel MPI. I successfully compiled the GSI with this combination today, but need to make a couple tweaks before handing it over to Ting.
Thank you @DavidHuber-NOAA for the update.
@TingLei-NOAA and @DavidHuber-NOAA , what is the status of this issue?
The expert at RDHCPS helpdesk recently worked on this. My finding is that, using the current compiler on hercules, the issue disappear. we will see if this observation could be confirmed/corrected by the former's confirmation/clarification.
Thanks @TingLei-NOAA for the update. Hopefully this issue can be closed soon.
Hercules is unable to handle parallel I/O when compiled with spack-stack v1.6.0.
The only obvious difference between v1.6.0 and v1.5.1 is netcdf-fortran, which was upgraded to v4.6.1 from v4.6.0. When attempting parallel reads/writes, netCDF/HDF5 errors are encountered.The cause of the failure appears to be the use of theI_MPI_EXTRA_FILESYSTEM=1
flag, which enables native support for parallel I/O. Turning on netCDF debugging options reveals the following HDF5 traceback:This may be a Lustre issue on that system,
but if that's the case, it is perplexing that it only occurs with the implementation of netcdf-fortran.A large number of HDF5 MPI ctest fail (both v1.14.3 and v1.14.0) on both Hercules and Orion,
so it's not clear if this could be a lower-level library issue that only Hercules is sensitive to. On closer examination, these 'failures' are mostly caused by warning messages about certainI_MPI*
flags being ignored.