Multi-GPU backend - Githubissues

lucasgrjn commented 1 week ago

Hi!

I am trying to use KernelAbstractions.jl to work on a FDTD (Finite-Difference) simulation package. Overall, from what I understand, everything is fine except for multi-GPU. I am having trouble seeing how multi-GPU can be handled easily with KA.jl (maybe it is not possible and I have to go directly to CUDA.jl…?)

So, I was wondering if any of you have examples of how to implement some computations using KA.jl and multi-GPU! (Bonus point if there is halo communications, even better if they are done by asynchronous communication between them)

Any help or information is welcome, Thanks!

Lucas

luraess commented 1 week ago

In, e.g., https://github.com/PTsolvers/Chmy.jl, we implemented the multi-GPU part over KA using MPI.jl.

vchuravy commented 1 week ago

Let me turn your question around :)

What do you need for multi-GPU?

The thing that is likely missing the most is https://github.com/JuliaGPU/KernelAbstractions.jl/issues/395 (contributions always welcome.)

If you simply need MPI, this may be of help: There is https://github.com/JuliaGPU/KernelAbstractions.jl/blob/main/examples/mpi.jl

lucasgrjn commented 1 week ago

In, e.g., https://github.com/PTsolvers/Chmy.jl, we implemented the multi-GPU part over KA using MPI.jl.

Awesome! I will dig into it in more detail.

lucasgrjn commented 1 week ago

Let me turn your question around :)

What do you need for multi-GPU?

Not a long time ago Meta released Khronos which appears to be a really great tool for physicists (to simulate light propagation through FDTD). However, this version is only 1 GPU which can show quickly some limitation.

I would like to extend it to multi GPUs but I want to avoid "reinventing the wheel". Hence, the functions of a package I am interested in are the ones to handle parallelism with:

Halo / Boundaries management
Efficient way to store the data (in case we would like to take a snapshot, but I don't think this one is particularly difficult)

vchuravy commented 1 week ago

Hence, the functions of a package I am interested in are the ones to handle parallelism with:

Halo / Boundaries management

Efficient way to store the data (in case we would like to take a snapshot, but I don't think this one is particularly difficult)

Yeah, for me, those issues are out-of-scope for KernelAbstractions.jl

For pure Halo support there is https://github.com/eth-cscs/ImplicitGlobalGrid.jl

lucasgrjn commented 1 week ago

First of all, @luraess and @vchuravy, thanks a lot for your advices and all these packages!

By elimination, I understand KA is too simplistic so I need something to handle the communications (via MPI.jl) if I want to use multi-GPU.

Chmy.jl could solve this issue, but is mainly oriented to solve/study PDE on a full window.
ImplicitGlobalGrid.jl on the other hand, handle the communication between GPUs/CPUs for a staggered grid (with a pretty nice scaling) but also the halo. I could leverage this one to save the value of my calculations at different time steps.

Is my reasoning correct? (I am trying to understand why one cannot be used and the other is more optimal from your POV).

Thanks in advance for your response!

luraess commented 1 week ago

Thanks for your feedback @lucasgrjn . I'd happily give my POV on your questions. However, this is no longer related to KA directly, nor to this particular issue. May I suggest we move the discussion to Discourse - Julia at scale while x-referencing it? Asking on Discord would moreover allow other person with potentially similar interests or questions to join as well.

lucasgrjn commented 1 week ago

Thanks for your feedback @lucasgrjn . I'd happily give my POV on your questions. However, this is no longer related to KA directly, nor to this particular issue. May I suggest we move the discussion to Discourse - Julia at scale

Hey @luraess, I think it is a really good idea, I will do it today! (Is it also a Discord or is it a typo?)

luraess commented 1 week ago

I think it is a really good idea, I will do it today! (Is it also a Discord or is it a typo?)

I was referring to Julialang Discourse (not Discord) https://discourse.julialang.org/c/domain/parallel/34

lucasgrjn commented 6 days ago

For anyone interested, the topic on Discourse can be find at the following url: https://discourse.julialang.org/t/help-with-multi-gpu-fdtd-implementation-halo-exchange-and-efficient-monitors/121676.

Thanks for your help @luraess and @vchuravy!

JuliaGPU / KernelAbstractions.jl

Multi-GPU backend #540