CompFUSE / DCA

DCA++
BSD 3-Clause "New" or "Revised" License
36 stars 28 forks source link

[WIP] Adios2 main dca debugging #237

Closed PDoakORNL closed 3 years ago

PDoakORNL commented 3 years ago

This brings adios2 up to date with current master, but I expect it won't build in daint, working that out.

weilewei commented 3 years ago

For documenting purpose: On Summit, gcc/8.1.1 and cuda/10 doesn't work well as I got the following error. The error goes away when I update cuda to 11 and use gcc/8.1.1 & magma/2.5.4-cuda11.1.

CMake Error at /autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/cmake-3.18.2-cirtl5oah4d6bequfcoji6jbetertrna/share/cmake-3.18/Modules/CMakeTestCUDACompiler.cmake:52 (message):
  The CUDA compiler

    "/sw/summit/cuda/10.1.243/bin/nvcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /gpfs/alpine/proj-shared/cph102/weile/dev/src/adios2/DCA-2/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake cmTC_a16eb/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_a16eb.dir/build.make CMakeFiles/cmTC_a16eb.dir/build
    gmake[1]: Entering directory `/gpfs/alpine/cph102/proj-shared/weile/dev/src/adios2/DCA-2/build/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_a16eb.dir/main.cu.o
    /sw/summit/cuda/10.1.243/bin/nvcc      -c /gpfs/alpine/proj-shared/cph102/weile/dev/src/adios2/DCA-2/build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_a16eb.dir/main.cu.o
    /autofs/nccs-svm1_sw/summit/gcc/8.1.1/include/c++/8.1.1/type_traits(347): error: identifier "__ieee128" is undefined

    /autofs/nccs-svm1_sw/summit/gcc/8.1.1/include/c++/8.1.1/bits/std_abs.h(101): error: identifier "__ieee128" is undefined

    /autofs/nccs-svm1_sw/summit/gcc/8.1.1/include/c++/8.1.1/bits/std_abs.h(102): error: identifier "__ieee128" is undefined

Similar error has been reported here: https://github.com/LLNL/blt/issues/341

weilewei commented 3 years ago

The DCA with adios2 support version compiles and runs.

However, I wonder how to view G4 through adios2? More documentation is needed.

I want to compare distributed G4 and non-distributed on to verify the correctness.

weilewei commented 3 years ago

tp_accumulator_particle_hole_test build failed:

[ 59%] Built target tp_accumulator_gpu_test
[ 59%] Linking CXX executable tp_accumulator_particle_hole_test
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-8.1.1/libpng-1.6.34-whgrengqivmmm75oeeiwgsczqddqoh7i/lib/libpng16.so.16: undefined reference to `inflateValidate@ZLIB_1.2.9'
/usr/bin/ld: link errors found, deleting executable `tp_accumulator_particle_hole_test'
collect2: error: ld returned 1 exit status
make[2]: *** [test/unit/phys/dca_step/cluster_solver/shared_tools/accumulation/tp/tp_accumulator_particle_hole_test] Error 1
make[1]: *** [test/unit/phys/dca_step/cluster_solver/shared_tools/accumulation/tp/CMakeFiles/tp_accumulator_particle_hole_test.dir/all] Error 2
make: *** [all] Error 2
PDoakORNL commented 3 years ago

Working on this again today.

PDoakORNL commented 3 years ago

test this please

PDoakORNL commented 3 years ago

test this please

PDoakORNL commented 3 years ago

This brings relatively large changes to the way function distributed over ranks is treated. Also limited ADIOS2 support in master.