DLR-AMR / t8code

Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
https://dlr-amr.github.io/t8code/
GNU General Public License v2.0
135 stars 52 forks source link

Issues with t8_shmem #1109

Open DamynChipman opened 3 months ago

DamynChipman commented 3 months ago

When running make test and some of the tutorials, I see repeated errors and warnings about t8_shmem. One of the tests fails and any tutorial example with parallelism outputs a warning. Besides the failed test, it looks like all of the tutorials are successful regardless of the warning. I get these issues when building/running on my laptop and on a Linux cluster.

The main warning I see is the following:

[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before initializing a shared memory array.

I guess my question/issue is: does this affect accuracy or performance?

I am reviewing t8code for JOSS: https://github.com/openjournals/joss-reviews/issues/6887

For reference, here is some information on building, testing, and running the tutorials:

Build Info

cmake -B build-main -S . -DCMAKE_INSTALL_PREFIX=./build-main/local -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx

Testing

cd build-main
make -j
make -j test

See the attached test-output.txt for the output of make -j test ARGS="--rerun-failed --output-on-failure"

test-output.txt

Tutorials

Here is the output for the 3rd tutorial:

➜  tutorials git:(main) ✗ mpirun -n 4 ./t8_step2_uniform_forest
[libsc] This is libsc 2.8.5.999
[t8] This is t8 2.0.0-396-g758cb9903
[t8] CPP                      /opt/homebrew/bin/mpicxx
[t8] CPPFLAGS                 -Wall
[t8] CC                       /opt/homebrew/bin/mpicc
[t8] CFLAGS                   -Wall
[t8] LDFLAGS                  
[t8] LIBS                     Not available with CMake builds
[t8]  [step2] 
[t8]  [step2] Hello, this is the step2 example of t8code.
[t8]  [step2] In this example we build our first uniform forest and output it to vtu files.
[t8]  [step2] 
[t8]  [step2] Constructed coarse mesh with 2 prism trees.
[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before setting the shmem type.
[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before initializing a shared memory array.
[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before setting the shmem type.
[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before initializing a shared memory array.
[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before setting the shmem type.
[t8] WARNING: Trying to used shared memory but intranode and internode communicators are not set. You should call t8_shmem_init before initializing a shared memory array.
[t8] Constructed uniform forest with 1024 global elements.
[t8]  [step2] Created uniform forest.
[t8]  [step2] Refinement level:         3
[t8]  [step2] Local number of elements:     256
[t8]  [step2] Global number of elements:    1024
[t8]  [step2] Wrote forest to vtu files:    t8_step2_uniform_forest*
[t8]  [step2] Destroyed forest.
holke commented 3 months ago

Hi @DamynChipman, thank you for reporting the issue. are you by any chance using an M1 or M2 Mac processor? Possibly with OpenMPI? We recently noticed that this combination seems to have issues with the MPI shared memory implementation.

Shared memory not being active does not result in accuracy loss. It will increase the memory usage. However, the real use of shared memory for us kicks in when running on a cluster on > 1000 CPUs.

DamynChipman commented 3 months ago

Yeah, my laptop is an M1 MacBook and I have OpenMPI installed. I ran into the same warnings and failed test when building, testing, and running on a Linux cluster as well however.

Sounds good, if something else shows up, I'll let you all know, thanks!

holke commented 3 months ago

Thanks you.

I want to keep the issue open anyways so that we do not forget about it. We should investigate and address these warnings in future.

maelk3 commented 2 months ago

The combination OpenMPI together with libsc's CMake build system seems to be the culprit for the warnings. Libsc's CMake build system checks for the symbol MPI_COMM_TYPE_SHARED using the function check_symbol_exists in the header file mpi.h. MPICH defines this symbol as a macro but OpenMPI defines it as part of an anonymous enum which check_symbol_exists does not check for. Thus the compile definition SC_ENABLE_MPICOMMSHARED is missing causing the warnings.

holke commented 2 months ago

Next tep Update the sc version to develop.