cp2k / dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library
https://cp2k.github.io/dbcsr/
GNU General Public License v2.0
135 stars 46 forks source link

WIP: Add GPU aware MPI support in cannon algorithm #647

Closed gsitaram closed 7 months ago

gsitaram commented 1 year ago

If CUDA or HIP backend is enabled, --WITH-DBCSR-G2G CMake option enables the following:

Requirements:

Need help with the following:

jenkins-cscs commented 1 year ago

Can one of the admins verify this patch?

codecov[bot] commented 1 year ago

Codecov Report

Attention: 64 lines in your changes are missing coverage. Please review.

Comparison is base (5d92807) 67.0% compared to head (b76c8bd) 66.8%. Report is 129 commits behind head on develop.

:exclamation: Current head b76c8bd differs from pull request most recent head 0c37025. Consider uploading reports for the commit 0c37025 to get more accurate results

Files Patch % Lines
src/mm/dbcsr_mm_common.F 0.0% 55 Missing :warning:
src/mm/dbcsr_mm.F 25.0% 9 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #647 +/- ## ========================================= - Coverage 67.0% 66.8% -0.2% ========================================= Files 105 105 Lines 29121 29188 +67 ========================================= + Hits 19521 19523 +2 - Misses 9600 9665 +65 ``` | [Flag](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `66.8% <4.4%> (-0.2%)` | :arrow_down: | | [with-blas](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `66.8% <4.4%> (-0.2%)` | :arrow_down: | | [with-libxsmm](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `66.2% <4.7%> (-0.2%)` | :arrow_down: | | [with-mpi](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `66.9% <4.4%> (-0.2%)` | :arrow_down: | | [with-openmp](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `65.7% <4.4%> (-0.2%)` | :arrow_down: | | [without-mpi](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `66.0% <4.4%> (-0.2%)` | :arrow_down: | | [without-openmp](https://app.codecov.io/gh/cp2k/dbcsr/pull/647/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k) | `65.8% <4.4%> (-0.2%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cp2k#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

hfp commented 1 year ago

The OpenCL build fails with "undefined reference to c_calculate_norms" error, @hfp, could you help me fix it for the OpenCL backend?

Yes, I will help with this. I think it should not be a blocker for merge.


Btw, is there a way to perform atomic FP-add on (certain) AMD GPUs using OpenCL? It seems the CUDA based code path does that but everything I tried in OpenCL did not work for me like C11 atomics, legacy builtins, or even cheating by guessing a prototype function and calling it. I found some doc for inline assembly in OpenCL but I am a bit hesitant to adopt it (like reading all the arch specific doc for AMD GPUs). For NVidia btw I am using PTX inline assembly in OpenCL.

hfp commented 1 year ago

Regarding the failing check of file headers, I propose to put copyright lines in front of the LICENSE file like this, and to stick with generic

! Copyright (C) by the DBCSR developers group - All rights reserved                                !

Otherwise we are back to the business of duplicating author notes as part of the source code. I think authorship is recorded as part of repository's metadata and no one is "left behind". As an extension, I also propose to drop AUTHORS file.

A general overhaul is to have a file extension .md for LICENSE (and perhaps similar files).

( sorry the proposal is not exactly related and can be perhaps a separate PR )

gsitaram commented 1 year ago

The OpenCL build fails with "undefined reference to c_calculate_norms" error, @hfp, could you help me fix it for the OpenCL backend?

Yes, I will help with this. I think it should not be a blocker for merge.

Thank you so much.

Btw, is there a way to perform atomic FP-add on (certain) AMD GPUs using OpenCL? It seems the CUDA based code path does that but everything I tried in OpenCL did not work for me like C11 atomics, legacy builtins, or even cheating by guessing a prototype function and calling it. I found some doc for inline assembly in OpenCL but I am a bit hesitant to adopt it (like reading all the arch specific doc for AMD GPUs). For NVidia btw I am using PTX inline assembly in OpenCL

I will find out more details and inform you.

gsitaram commented 1 year ago

Regarding the failing check of file headers, I propose to put copyright lines in front of the LICENSE file like this, and to stick with generic

! Copyright (C) by the DBCSR developers group - All rights reserved                                !

Otherwise we are back to the business of duplicating author notes as part of the source code. I think authorship is recorded as part of repository's metadata and no one is "left behind". As an extension, I also propose to drop AUTHORS file.

A general overhaul is to have a file extension .md for LICENSE (and perhaps similar files).

( sorry the proposal is not exactly related and can be perhaps a separate PR )

I like this proposal of having a separate LICENSE file. If all DBCSR maintainers agree, I can make the required changes (keep only the DBCSR Developers Group copyright line in each file, and add a new license file with the entire text of the license and copyright lines above that.)

gsitaram commented 1 year ago

@hfp, it does not seem like there are atomic add operations in OpenCL for FP type data. You may have to use compare and swap instead to accomplish the same. I found this blog showing an example, but this is not recent. I see that in the newer OpenCL (3.0) spec, the atomic_* operations are deprecated, and we have to use a combination of memory order and fence operations.. but I am not very familiar with this set of functions. If I find any recent examples somewhere, I'll be sure to point those to you.

hfp commented 1 year ago

Please allow me to share my suggestions for this PR:

In my integration test (#649), I changed including some header files in calculate_norms.cpp from:

#if defined(__CUDA)
#  include "acc/cuda/acc_cuda.h"
#elif defined(__HIP)
#  include "acc/hip/acc_hip.h"
#endif
#include "libsmm_acc_init.h"

... to the following:

#if defined(__CUDA)
#  include "../cuda/acc_cuda.h"
#elif defined(__HIP)
#  include "../hip/acc_hip.h"
#endif
#include "libsmm_acc_init.h"
gsitaram commented 1 year ago

@hfp, I merged the master into this branch and changed the path to those header files as you suggested. As for the name of the file, hipcc takes .cpp extension, and nvcc can compile the .cpp file with the option -x cuda. So, I didn't change it. If the changes introduced to the build system are not acceptable, please feel free to change. After I got it to build, I didn't touch it. I'll discuss with Alfio and see how to proceed with the license blob. Thanks for your help in getting the opencl backend to accept these changes.

alazzaro commented 1 year ago

@gsitaram Thanks for this PR and sorry for the late reply (I would like to thank @hfp for his contribution too).

I'm putting here some of the topics/comments:

I will add some specific comments in the files.

gsitaram commented 1 year ago

@alazzaro , please check if the previous commit reflects our discussion about avoiding the g2g path for all data types other than real_8.

hfp commented 1 year ago

I think Gina was right with the previous location of c_calculate_norms (under src/acc/libsmm_acc). That's the "library" also hosting transpose and multiply (in fact it's also CUDA/HIP-only). Another sign of calculate_norms.cpp belonging to LIBSMM_ACC is the latter only contains infrastructure-code like mem, stream, error, etc., and never contained calculation routines. Further, calculate_norms.cpp depends on libsmm_acc_init.h (acc_get_gpu_warp_size), which is now awkward after moving it to cuda_hip.

alazzaro commented 7 months ago

thanks @gsitaram !