JuliaORNL / JACC.jl

CPU/GPU parallel performance portable layer in Julia via functions as arguments
MIT License
21 stars 13 forks source link

Create small example of lightweight wrapper for JACC #67

Closed kmp5VT closed 4 months ago

kmp5VT commented 6 months ago

Right now it seems slightly cumbersome and restrictive to have a global flag which enables different backends. Because all CPU and GPU code is compiled with LLVM it should be possible to add a lightweight wrapper level around the different datatypes which makes it possible for multiple dispatch to determine which backend to use. Here is a small example of how it works. In principle this design is also safer because it does not overwrite the Array variable.

williamfgc commented 6 months ago

Thanks @kmp5VT . I think this need a break down and more discussion. The added value and implications need to be well understood.

Right now it seems slightly cumbersome and restrictive to have a global flag which enables different backends. Because all CPU and GPU code is compiled with LLVM it should be possible to add a lightweight wrapper level around the different datatypes which makes it possible for multiple dispatch to determine which backend to use.

Wouldn't it be best to know ahead of time, rather than at every method to save on boilerplate? We are following canonical LocalPreferences.toml like MPI.jl (hence a global rather than an Env variable) with Julia v1.9 Weak dependencies feature which also avoids heavy dependencies when not needed.

kmp5VT commented 6 months ago

@williamfgc Thank you for explaining the rational! Hopefully my initial comment did not come off too strongly I think there is much merit in this library!

In my initial commit I believe I misunderstood the implementation of the library and I apologize for that mistake! I reexamined the code more and believe I now understand the larger issue. Correct me if I am wrong, but it looks like your x variable is a variadic list of any possible arguments that go into the function f. So you don't technically know as the developers which argument in x is the array you are parallelizing over.

I think in this next push I have been able to design a slightly more robust system that still uses the multiple dispatch to make the code simpler and enable users to use both GPU and CPU parallelization simultaneously.

I am still not 100% sure that this would fit your needs and I have not checked the performance of the code.From what I understand, using the multiple dispatch system instead of the LocalPreferences.jl route should make the library more generic while preserving performance since dispatch is determined at compile time, not runtime. I found this talk useful for reference https://discourse.julialang.org/t/understanding-multiple-dispatch/76601/5

I look forward to your feedback, thanks!

williamfgc commented 6 months ago

@kmp5VT my two cents is to start with what problem the proposed solution is solving. For now, optional "weak" dependencies is the canonical way since Julia v1.9 to separate back ends and code that is mutually exclusive (which is the most common target use case). Multiple dispatch does mean we need to install a back end in a system that might not support it, this 'causes major trouble in HPC systems.