JuliaORNL / JACC.jl

CPU/GPU parallel performance portable layer in Julia via functions as arguments
MIT License
21 stars 13 forks source link

Should use `__precompile__(false)` (at least for extensions) #53

Open PhilipFackler opened 7 months ago

PhilipFackler commented 7 months ago

When using the cuda backend I get the following message:

WARNING: Method definition parallel_for(I, F, Any...) where {I<:Integer, F<:Function} in module JACC at /home/4pf/.julia/packages/JACC/Crxi0/src/JACC.jl:11 overwritten in module JACCCUDA at /home/4pf/.julia/packages/JACC/Crxi0/ext/JACCCUDA/JACCCUDA.jl:5.
ERROR: Method overwriting is not permitted during Module precompilation. Use `__precompile__(false)` to opt-out of precompilation.

It's an error when I don't add __precompile__(false) to my own module definition.

michel2323 commented 7 months ago

This will be fixed with https://github.com/JuliaORNL/JACC.jl/pull/52 .

williamfgc commented 6 months ago

@michel2323 @PhilipFackler is this resolved?

PhilipFackler commented 6 months ago

The solution merged with #52 is too restrictive. I'm using a previous revision to avoid it. So, I would say no.

michel2323 commented 6 months ago

@PhilipFackler In what sense too restrictive?

PhilipFackler commented 6 months ago

@michel2323 see #56

williamfgc commented 6 months ago

@PhilipFackler can you elaborate why this is an error or how it's trigerred? I'm trying to understand why it doesn't show on our GPU CI tests e.g. here

michel2323 commented 6 months ago

It doesn't get triggered because I resolved it by specifying types in https://github.com/JuliaORNL/JACC.jl/pull/52 . However, that leads to @PhilipFackler only being able to pass numbers or arrays, not sructs https://github.com/JuliaORNL/JACC.jl/issues/56 . The two issues are intertwined. When using structs, we cannot dispatch for the backend via the struct type, as it's not clear whether it is a CUDA or whatever based type.

Edit: https://github.com/JuliaORNL/JACC.jl/actions/runs/8238659570/job/22530105977#step:4:298 Old error.

williamfgc commented 6 months ago

I did more digging and overriding functions due to variadic arguments of any type x... is an error since Julia 1.10 (it was a warning in 1.9) because of the current nature of JACC.jl it makes sense to use __precompile__(false) and let users select back ends for the appropriate module compilation in ext.

williamfgc commented 6 months ago

I'm turning pre-compilation off (for now). Changing APIs will need a much larger discussion as we discussed offline.

michel2323 commented 6 months ago

Every package that uses JACC.jl will also not be able to precompile. So, if you implement a PDE solver using JACC.jl, that solver won't be precompiled, substantially slowing down a user's code startup. So even if JACC is light, it will break precompilation for every large package that uses JACC.

williamfgc commented 6 months ago

@michel2323 point taken. We should open a new issue trying to understand the trade-offs of code start-up. Since we are targeting HPC use cases, we should understand how this cost is amortized and the JIT implications. Precompilation and optional packages as weak_dependencies are somehow opposite concepts :). We just have a few simple use cases we want to try with the simple API and make incremental progress.

williamfgc commented 5 months ago

FYI, I tested the current tag version v0.3 without precompile __(false)__ with the GrayScott.jl on this branch, that has JACC as a strong dependency, precompilation just hangs on various systems as the pid process is still running. I don't know if it's related to Julia issue #51901. Works with the current main with __precompile__(false).

For added context, the GrayScott.jl pre-compilation takes roughly 18s on my local machine and similar numbers on our AMD GPU CI system. For an hours-long runs this is an amortized cost, but we'll keep en eye if this becomes a real problem. Hope this helps.

michel2323 commented 5 months ago

Especially if you add MPI to the mix, which I guess is important for ORNL, having no precompilation support will be daunting. On those machines, you get filesystem access races. During that process message you mention.

williamfgc commented 5 months ago

Yes, those are really good points., The JIT component does not go away completely with pre-compilation, we published some numbers on the paper running on Frontier. Also, accessing local files is not completely by-passed (e.g. LocalPreferences and local packages). I think something like PackageCompiler.jl might help for this scenario to compile an app?. This is something @PhilipFackler tried. Still, I'd try to test the overheads when we have more apps use cases :)

michel2323 commented 5 months ago

PackageCompiler won't work if precompilation doesn't work or is manually disabled.

williamfgc commented 5 months ago

Yes, I meant exploring the precompile + PackageCompiler.jl path. For the time being keeping __precompile__ (false) is what sanely works (not free lunch) :)