HPSCTerrSys / TSMP

Terrestrial Systems Modelling Platform (TSMP or TerrSysMP)
https://www.terrsysmp.org/
Other
23 stars 16 forks source link

Update files TSMP1 for Jedi and Gnu (note: build for clm3 + parflow cpu) #257

Open AGonzalezNicolas opened 3 weeks ago

AGonzalezNicolas commented 3 weeks ago

I added changes to TSMP1 to compile it on JEDI. The build is with Gnu (no Intel on Jedi) and was successfully built for the combination of "clm3 + par flow CPU": ./build_tsmp.ksh -c clm3-pfl -m JEDI -O Gnu Currently, it's for Stages2024.

Gnu fixes are based on: https://gitlab.jsc.fz-juelich.de/HPSCTerrSys/tsmp-internal-development-tracking/-/issues/65

Gnu fixes were on this branch: https://github.com/HPSCTerrSys/TSMP/tree/gnu_update

Parflow needed to be updated too: "Remove multiple definitions of AMPS_CPU_TICKS_PER_SEC to be gcc compatible" https://github.com/HPSCTerrSys/parflow/tree/gnu_fix

DCaviedesV commented 3 weeks ago

What is the behaviour if the gnu_fix Parflow branch is not used? I'm surprised a GCC fix is necessary since the ParFlow CI/CD workflow uses GCC. Maybe @chartick can also remind us...

AGonzalezNicolas commented 3 weeks ago

I just tried to build it again (on JEDI) using the latest Parflow version (remote url: https://github.com/parflow/parflow.git, commit: 95058cb704337479839ba30b1dfaf22a3d5ab36b, tag: v3.13.0-35-g95058cb7) and it didn't work. The error occurs after the definition of AMPS_CPU_TICKS_PER_SEC (see Lines 1063-end in file err_all).

To solve this issue, @chartick commented the AMPS_CPU_TICKS_PER_SEC on the Parflow version updated for gnu .

err_all_131124-102419.txt log_all_131124-102419.txt

DCaviedesV commented 3 weeks ago

Is it strictly a GCC related issue? Can it be an MPI related issue? I guess the ParFlow CI would use OpenMPI and not ParastationMPI. The errors I see in err_all_131124-102419.txt are not very verbose: [pfsimulator/amps/test/src/CMakeFiles/test1.dir/build.make:103: pfsimulator/amps/test/src/test1] Error 1

but indicate that this just fails in some tests.

Is there any chance that it is an ARM related issue? That is, does it also fail building on x86 CPUs in JUWELS/JURECA?

If this is a real issue, we need to make a pull request upstream into the ParFlow master, and this needs to be clearly documented.

chartick commented 3 weeks ago

I did it some time ago, and I don't remember everything, but it should be in my leaving document.

If I remember correctly, the issues were only on GPU with a specific pre-conditioner. AMPS_CPU_TICKS_PER_SEC is a performance indicator or something.

DCaviedesV commented 3 weeks ago

Thanks @chartick. For completeness, this is @chartick's comment which was in his leaving doc: "gnu_fix removes some performance measurements so that the compilation of ParFlow runs through with newer versions of GCC."

AGonzalezNicolas commented 3 weeks ago

Using the latest version of Parflow, the build fails on Jureca and Juwels. It gets the same error. If I use the parflow version with the gnu-fix, it is built on both ( Jureca and Juwels).

DCaviedesV commented 3 weeks ago

Could you try with OpenMPI, for completeness?