Unit testing with pFUnit

timfelle commented 4 months ago

I have managed to setup pFUnit on my local PC and are attempting to get the unit tests up and running. However, there seem to be one that keeps failing on me.

The point_interpolation check keeps failing. It seems to be written as a MPI based test, but it is only executed on a single processor. There seem to be an issue with the underlying device allocation in the space_t. However, the space and device unit tests does not fail.

Any ideas what might be happening?

Error message

``` 1/1 Test #26: neko_point_interpolation_parallel ...***Failed 2.79 sec [ctest] Start: . --------Gather-Scatter-------- end: [ctest] Start: . Program received signal SIGSEGV: Segmentation fault - invalid memory reference. [ctest] Backtrace for this error: #0 0x7fa1f6dc1960 in ??? #1 0x7fa1f6dc0ac5 in ??? #2 0x7fa1f6ad051f in ??? at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 #3 0x7fa1ea602a2c in ??? #4 0x7fa1ea7feef9 in ??? #5 0x7fa1ea539248 in ??? #6 0x7fa1ea61e154 in ??? #7 0x7fa1f96480f8 in ??? #8 0x7fa1f96187d1 in ??? #9 0x7fa1f96715ab in ??? #10 0x556fdc685ba8 in device_memcpy_common at device/device.F90:404 #11 0x556fdc6875de in __device_MOD_device_memcpy_r1 at device/device.F90:223 #12 0x556fdc63f6b4 in __space_MOD_space_init at sem/space.f90:285 #13 0x556fdc634706 in __point_interpolation_parallel_MOD_point_interpolation_test_interpolation at /home/tife/Projects/neko-top/external/neko/tests/point_interpolation/point_interpolation_parallel.pf:132 #14 0x556fdc6392c7 in __pf_mpitestmethod_MOD_runmethod at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/pfunit/core/MpiTestMethod.F90:88 #15 0x556fdc63c7a4 in __pf_mpitestcase_MOD_runbare at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/pfunit/core/MpiTestCase.F90:100 #16 0x556fdc6b6fb5 in __pf_testcase_MOD_runbare_surrogate at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/core/TestCase.F90:146 #17 0x556fdc7558df in __pf_testresult_MOD_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/core/TestResult.F90:237 #18 0x556fdc63c894 in __pf_mpitestcase_MOD_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/pfunit/core/MpiTestCase.F90:82 #19 0x556fdc6b884a in __pf_testsuite_MOD_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/core/TestSuite.F90:108 #20 0x556fdc6b884a in __pf_testsuite_MOD_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/core/TestSuite.F90:108 #21 0x556fdc7570bd in __pf_testrunner_MOD_runwithresult at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/core/TestRunner.F90:139 #22 0x556fdc758907 in __pf_testrunner_MOD_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/core/TestRunner.F90:117 #23 0x556fdc73dbba in __funit_MOD_generic_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/funit/FUnit.F90:118 #24 0x556fdc63a116 in __pfunit_MOD_run at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/pfunit/pFUnit.F90:108 #25 0x556fdc639260 in funit_main_ at /home/tife/Projects/neko-top/build/_deps/pfunit-src/src/pfunit/pfunit_main.F90:16 #26 0x556fdc62fa0b in MAIN__ at /home/tife/Projects/neko-top/build/tests/neko_point_interpolation_parallel_driver.F90:82 #27 0x556fdc62fa0b in main at /home/tife/Projects/neko-top/build/tests/neko_point_interpolation_parallel_driver.F90:53 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- ```

System information

``` Windows: Edition Windows 11 Enterprise Version 23H2 Installed on ‎25/‎09/‎2023 OS build 22631.3155 Experience Windows Feature Experience Pack 1000.22684.1000.0 WSL: Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy Compilers : gfortran : GNU Fortran (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 nvcc : NVIDIA (R) Cuda compiler driver, release 12.3, V12.3.103 Hardware: CPU type : 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz Bcknd type: Accelerator (CUDA) Dev. name : NVIDIA RTX A3000 Laptop GPU ```

timofeymukha commented 4 months ago

You might have caught something here, because I don't think we run the checks on the device.

timfelle commented 4 months ago

Hmm alright interesting. I'll investigate a bit more.

timfelle commented 4 months ago

It does complete fine on my tablet, which does not have a GPU.

njansson commented 4 months ago

It does complete fine on my tablet, which does not have a GPU.

Looking at the failing tests, it seems like that one is missing a call to device_init

njansson commented 4 months ago

The best way to avoid this is to add generic setup and teardown routines as in device_math

https://github.com/ExtremeFLOW/neko/blob/69653aa2f0935ce737fbf6c63b5fba7efcb82727/tests/device_math/device_math_parallel.pf#L13-L34

timfelle commented 4 months ago

Yep that did it. I have added a PR with the corrections. It might still be useful to revisit the tests and make them behave a bit closer to what you describes the math one does, but so far it is a quickfix.

Lets keep this issue open so we remember to do it more thoroughly.

njansson commented 3 months ago

Yep that did it. I have added a PR with the corrections. It might still be useful to revisit the tests and make them behave a bit closer to what you describes the math one does, but so far it is a quickfix.

Lets keep this issue open so we remember to do it more thoroughly.

Should we keep this for v0.8, or postpone the fixes until 0.9?

timfelle commented 3 months ago

Yep that did it. I have added a PR with the corrections. It might still be useful to revisit the tests and make them behave a bit closer to what you describes the math one does, but so far it is a quickfix. Lets keep this issue open so we remember to do it more thoroughly.

Should we keep this for v0.8, or postpone the fixes until 0.9?

When do we plan to do the 0.8 release?

njansson commented 3 months ago

Yep that did it. I have added a PR with the corrections. It might still be useful to revisit the tests and make them behave a bit closer to what you describes the math one does, but so far it is a quickfix. Lets keep this issue open so we remember to do it more thoroughly.

Should we keep this for v0.8, or postpone the fixes until 0.9?

When do we plan to do the 0.8 release?

Current target is feature freeze end of April, and release mid May

timfelle commented 3 months ago

Yep that did it. I have added a PR with the corrections. It might still be useful to revisit the tests and make them behave a bit closer to what you describes the math one does, but so far it is a quickfix. Lets keep this issue open so we remember to do it more thoroughly.

Should we keep this for v0.8, or postpone the fixes until 0.9?

When do we plan to do the 0.8 release?

Current target is feature freeze end of April, and release mid May

I'll look through it before then.

ExtremeFLOW / neko

Unit testing with pFUnit #1155