JuliaParallel / MPI.jl

MPI wrappers for Julia
https://juliaparallel.org/MPI.jl/
The Unlicense
382 stars 122 forks source link

Rename extensions to include MPI in name #802

Open luraess opened 1 year ago

luraess commented 1 year ago

Developing a package that also requires AMDGPU and CUDA extensions and MPI, we hit an issue about conflicting extension names that can't be resolved.

A solution is to name the extensions in a way to include the package name they are belonging to. For MPI, it means renaming extensions to MPIAMDGPUext and MPICUDAext.

If this sounds good I could draft a PR.

simonbyrne commented 1 year ago

Can you give more details on the issue (stack trace, reproducer)?

utkinis commented 1 year ago

Reproducer

Assume we have a package ExtensionsConflict.jl, that depends on MPI.jl, and also has an extension for CUDA.jl. The structure of the package is as follows:

iutkin$ tree
.
├── Manifest.toml
├── Project.toml
├── ext
│   └── CUDAExt
│       └── CUDAExt.jl
└── src
    └── ExtensionsConflict.jl

3 directories, 4 files

iutkin$ cat Project.toml
name = "ExtensionsConflict"
uuid = "97c6e3c7-976d-4ed6-9a3c-28cf76c81daa"
authors = ["Ivan Utkin <iutkin@ethz.ch>"]
version = "0.1.0"

[deps]
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"

[weakdeps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"

[extensions]
CUDAExt = "CUDA"

iutkin$ cat src/ExtensionsConflict.jl
module ExtensionsConflict

greet() = print("Hello World!")

end # module ExtensionsConflict

iutkin$ cat ext/CUDAExt/CUDAExt.jl
module CUDAExt

using ExtensionsConflict, CUDA

ExtensionsConflict.greet(dev::CuDevice) = "device $(deviceid(dev))"

end

Then, when trying to use the package, CUDA, and MPI and the same time leads to the name conflict:

(ExtensionsConflict) pkg> up
    Updating registry at `~/.julia/registries/General.toml`
  No Changes to `/scratch-1/iutkin/ExtensionsConflict.jl/Project.toml`
  No Changes to `/scratch-1/iutkin/ExtensionsConflict.jl/Manifest.toml`

julia> using ExtensionsConflict

julia> using CUDA

julia> using MPI
[ Info: Precompiling CUDAExt [11b7e2e0-d079-575b-885e-0ab22ef3252c]
ERROR: LoadError: ArgumentError: Package CUDAExt does not have ExtensionsConflict in its dependencies:
- You may have a partially installed environment. Try `Pkg.instantiate()`
  to ensure all packages in the environment are installed.
- Or, if you have CUDAExt checked out for development and have
  added ExtensionsConflict as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with CUDAExt
Stacktrace:
 [1] macro expansion
   @ ./loading.jl:1634 [inlined]
 [2] macro expansion
   @ ./lock.jl:267 [inlined]
 [3] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:1611
 [4] include
   @ ./Base.jl:457 [inlined]
 [5] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2049
 [6] top-level scope
   @ stdin:3
in expression starting at /scratch-1/iutkin/ExtensionsConflict.jl/ext/CUDAExt/CUDAExt.jl:1
in expression starting at stdin:3
┌ Error: Error during loading of extension CUDAExt of MPI, use `Base.retry_load_extensions()` to retry.
│   exception =
│    1-element ExceptionStack:
│    Failed to precompile CUDAExt [11b7e2e0-d079-575b-885e-0ab22ef3252c] to "/home/iutkin/.julia/compiled/v1.9/CUDAExt/jl_WoAYgY".
│    Stacktrace:
│      [1] error(s::String)
│        @ Base ./error.jl:35
│      [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
│        @ Base ./loading.jl:2294
│      [3] compilecache
│        @ ./loading.jl:2167 [inlined]
│      [4] _require(pkg::Base.PkgId, env::Nothing)
│        @ Base ./loading.jl:1805
│      [5] _require_prelocked(uuidkey::Base.PkgId, env::Nothing)
│        @ Base ./loading.jl:1660
│      [6] _require_prelocked(uuidkey::Base.PkgId)
│        @ Base ./loading.jl:1658
│      [7] run_extension_callbacks(extid::Base.ExtensionId)
│        @ Base ./loading.jl:1255
│      [8] run_extension_callbacks(pkgid::Base.PkgId)
│        @ Base ./loading.jl:1290
│      [9] run_package_callbacks(modkey::Base.PkgId)
│        @ Base ./loading.jl:1124
│     [10] _require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:1667
│     [11] macro expansion
│        @ ./loading.jl:1648 [inlined]
│     [12] macro expansion
│        @ ./lock.jl:267 [inlined]
│     [13] require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:1611
│     [14] top-level scope
│        @ ~/.julia/packages/CUDA/YIj5X/src/initialization.jl:208
│     [15] eval
│        @ ./boot.jl:370 [inlined]
│     [16] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
│        @ REPL /scratch-1/soft/julia/julia-1.9.4/share/julia/stdlib/v1.9/REPL/src/REPL.jl:153
│     [17] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
│        @ REPL /scratch-1/soft/julia/julia-1.9.4/share/julia/stdlib/v1.9/REPL/src/REPL.jl:249
│     [18] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
│        @ REPL /scratch-1/soft/julia/julia-1.9.4/share/julia/stdlib/v1.9/REPL/src/REPL.jl:234
│     [19] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
│        @ REPL /scratch-1/soft/julia/julia-1.9.4/share/julia/stdlib/v1.9/REPL/src/REPL.jl:379
│     [20] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│        @ REPL /scratch-1/soft/julia/julia-1.9.4/share/julia/stdlib/v1.9/REPL/src/REPL.jl:365
│     [21] (::Base.var"#1018#1020"{Bool, Bool, Bool})(REPL::Module)
│        @ Base ./client.jl:421
│     [22] #invokelatest#2
│        @ ./essentials.jl:819 [inlined]
│     [23] invokelatest
│        @ ./essentials.jl:816 [inlined]
│     [24] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│        @ Base ./client.jl:405
│     [25] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:322
│     [26] _start()
│        @ Base ./client.jl:522
└ @ Base loading.jl:1261

julia>

Motivation

@luraess suggests prefixing the extension name with the package name, to avoid such conflicts, i.e., in MPI.jl extensions will be named MPICUDAExt and AMDGPUCUDAExt. This is also consistent with the Pkg.jl docs: https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions).

I'm not sure that this is the expected behaviour though, at least it's not documented anywhere, and it doesn't make sense for extensions of different packages to know each other, so probably this is rather the bug in Pkg.

luraess commented 1 year ago

Thanks @utkinis for providing the MWE and for adding some context. Although the doc is not explicit about this, note that in the example, the naming convention for the extension appends the package name to the package name the extension supports PlottingContourExt. Following this logic, we should have MPIAMDGPUExt. But as said in previous comment, IDK if this is meant to be so or if there is an issue with Pkg.

simonbyrne commented 11 months ago

@utkinis how are you loading CUDA, since it isn't a dependency of ExtensionsConflict? do you have it via another environment?

simonbyrne commented 11 months ago

BTW, I'm not opposed to the rename: it shouldn't break anything, I just wanted to figure out what exactly is going on.

luraess commented 11 months ago

@utkinis how are you loading CUDA, since it isn't a dependency of ExtensionsConflict? do you have it via another environment?

We tend to have the GPU backend dependency in the base env (on GPU servers, supercomputers) as this would then make it available to all other projects. But we also got the above mentioned issue with e.g. CUDA as deps of ExtensionsConflict.

luraess commented 11 months ago

BTW, I'm not opposed to the rename: it shouldn't break anything, I just wanted to figure out what exactly is going on.

We guess the issue could be addressed in Pkg if extensions would be resolved based on UUIDs instead of names (have no insights there - maybe there is a good reason for things to be as they are).

simonbyrne commented 11 months ago

But we also got the above mentioned issue with e.g. CUDA as deps of ExtensionsConflict.

This is a bit of a weird case: apparently if a package is both in deps and weakdeps, it is treated as a weakdeps: https://github.com/JuliaLang/Pkg.jl/issues/3727#issuecomment-1850847122

luraess commented 11 months ago

But we also got the above mentioned issue with e.g. CUDA as deps of ExtensionsConflict.

This is a bit of a weird case: apparently if a package is both in deps and weakdeps, it is treated as a weakdeps: JuliaLang/Pkg.jl#3727 (comment)

EDIT: But we also got the above mentioned issue within a project using both CUDA and ExtensionsConflict.

simonbyrne commented 11 months ago

But we also got the above mentioned issue within a project using both CUDA and ExtensionsConflict.

Do you have an example of that?

luraess commented 11 months ago

But we also got the above mentioned issue within a project using both CUDA and ExtensionsConflict.

Do you have an example of that?

@utkinis could you test this ☝️ in your MWE?

KristofferC commented 11 months ago

We guess the issue could be addressed in Pkg if extensions would be resolved based on UUIDs instead of names

That should already be the case. I'll try see what is going on.

KristofferC commented 11 months ago

I tried this:

pkg> activate --temp
  Activating new project at `/tmp/jl_SWjSGu`

(jl_SWjSGu) pkg> dev .
   Resolving package versions...
    Updating `/tmp/jl_SWjSGu/Project.toml`
  [97c6e3c7] + ExtensionsConflict v0.1.0 `../../home/kc/JuliaTests/ExtensionsConflict`
    Updating `/tmp/jl_SWjSGu/Manifest.toml`

(jl_SWjSGu) pkg> add CUDA, MPI

julia> using ExtensionsConflict

julia> using CUDA

julia> using MPI

julia> Base.get_extension(ExtensionsConflict, :CUDAExt)
CUDAExt

I don't see where the possible name collision could be.

luraess commented 11 months ago

Indeed, in this setting ☝️ no error occurs. Seems the issue occurs when within the ExtensionsConflict package, and using CUDA from the global environment (e.g. @v1.9).

simonbyrne commented 11 months ago

Indeed, in this setting ☝️ no error occurs. Seems the issue occurs when within the ExtensionsConflict package, and using CUDA from the global environment (e.g. @v1.9).

Yes, I was able to reproduce it in this case.