Goddard-Fortran-Ecosystem / pFUnit

Parallel Fortran Unit Testing Framework
Other
172 stars 45 forks source link

Segfault with nvfortran #363

Closed havogt closed 2 years ago

havogt commented 2 years ago

Hi! I'd like to understand the status of nvfortran support.

The release notes of v4.2.3 claim support for nvfortran 22.3, however the discussion in #337 indicated there were some things missing.

When I try a unittest executable (e.g. from the demos repository), they segfault with nvfortran 22.3. The segfault is in this line https://github.com/Goddard-Fortran-Ecosystem/gFTL/blob/012c864bbd28a07c3921d4293e184ef173cd7c12/include/v1/templates/vector_impl.inc#L217.

I can provide an environment that can be used to reproduce, however I can't invest more time in debugging myself. Any help is greatly appreciated.

Here are the instructions to reproduce in a docker container

docker pull ghcr.io/gridtools/gridtools-base:nvhpc-22.3
docker run -it ghcr.io/gridtools/gridtools-base:nvhpc-22.3 bash
apt-get update
apt-get -y install m4
git clone https://github.com/Goddard-Fortran-Ecosystem/pFUnit.git
cd pFUnit
mkdir build && cd build
cmake ..
make -j$(nproc) install
cd
git clone https://github.com/Goddard-Fortran-Ecosystem/pFUnit_demos.git
cd pFUnit_demos/Trivial
mkdir build && cd build
cmake .. -DPFUNIT_DIR=/pFUnit/build/installed/PFUNIT-4.4/cmake
make
./my_tests

(Note that the issue that I try to fix in #362 is not reproducible in the docker container, that is an issue that I have on my local machine.)

tclune commented 2 years ago

Sorry about that - my release notes were poorly worded. The release fixed bugs in pFUnit itself that were only detected by nvfortran (internal development version of the compiler). Basically, compilers I have access to, all support ISO_REAL_128, but nvfortran does not, so it found errors in some of my fpp logic.

Having said that the very latest nvfortran (22.7) supposedly will compile pFUnit, but my NVIDIA contact has not yet attempted to run the self tests, so I would not be particularly optimistic that things are quite there yet. NVIDIA is working to get the research code (GEOS) that my team supports to build with their compiler and these GFE layers are part of the software stack for GEOS. Their latest compiler still breaks on some constructs in GEOS, which is why they've not yet circled back to look at run-time issues.

Short summary: not there yet, but reason to hope that the situation will be fixed in the next 1-2 months.

havogt commented 2 years ago

Thanks for the feedback! And thanks for your effort to get this working!