JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.21k stars 220 forks source link

ERROR: Your LLVM does not support the NVPTX back-end. in local project environment #249

Closed Roger-luo closed 4 years ago

Roger-luo commented 4 years ago

I'm getting the following error on julia-1.5-beta1

In the default shared environment, this is fine. However, if I start Julia with julia --project, this somehow gives me the following error

julia> using CUDA

julia> CUDA.functional()
ERROR: Your LLVM does not support the NVPTX back-end.

This is very strange; both the official binaries
and an unmodified build should contain this back-end.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] llvm_compat(::VersionNumber) at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:181
 [3] llvm_compat at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:176 [inlined]
 [4] __init_compatibility__() at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:236
 [5] __runtime_init__() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:121
 [6] (::CUDA.var"#581#582"{Bool})() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:32
 [7] lock(::CUDA.var"#581#582"{Bool}, ::ReentrantLock) at ./lock.jl:161
 [8] _functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:26
 [9] functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:19
 [10] functional() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:18
 [11] top-level scope at REPL[2]:1

To reproduce

The Minimal Working Example (MWE) for this bug:

pkg> generate test_project
pkg> activate test_project
julia> using CUDA

julia> CUDA.functional()
ERROR: Your LLVM does not support the NVPTX back-end.

This is very strange; both the official binaries
and an unmodified build should contain this back-end.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] llvm_compat(::VersionNumber) at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:181
 [3] llvm_compat at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:176 [inlined]
 [4] __init_compatibility__() at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:236
 [5] __runtime_init__() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:121
 [6] (::CUDA.var"#581#582"{Bool})() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:32
 [7] lock(::CUDA.var"#581#582"{Bool}, ::ReentrantLock) at ./lock.jl:161
 [8] _functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:26
 [9] functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:19
 [10] functional() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:18
 [11] top-level scope at REPL[3]:1

Expected behavior

It should work fine as the global shared environment...

Version info

Details on Julia:

# please post the output of:
julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)

Details on CUDA:

# please post the output of:
CUDA.versioninfo()

versioninfo doesn't exist in v1.0.2 ..., ~but I'm using CUDA v1.0.2 with cuda toolkit 11~

but I manage to print this on master branch in the global environment (it will error in local project environment)

this problem remains the same on latest master branch of CUDA.

julia> CUDA.versioninfo()
CUDA toolkit 10.2.89, artifact installation
CUDA driver 11.0.0

Libraries:
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUTENSOR: 1.0.1
- CUDNN: 7.6.5
- CUPTI: 12.0.0

Packages:
- CUDA.jl: 1.0.2
- LLVM.jl: 1.5.2
- GPUCompiler.jl: 0.4.0
- GPUArrays.jl: 4.0.0
- Adapt.jl: 2.0.0

Toolchain:
- Julia: 1.5.0-beta1.0
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device(s):
- GeForce RTX 2080 SUPER (7.078 GiB, sm_75)
vchuravy commented 4 years ago

How did you build/obtain Julia?

On Wed, Jun 24, 2020, 15:50 Rogerluo notifications@github.com wrote:

I'm getting the following error on julia-1.5-beta1

In the default shared environment, this is fine. However, if I start Julia with julia --project, this somehow gives me the following error

julia> using CUDA

julia> CUDA.functional() ERROR: Your LLVM does not support the NVPTX back-end.

This is very strange; both the official binaries and an unmodified build should contain this back-end. Stacktrace: [1] error(::String) at ./error.jl:33 [2] llvm_compat(::VersionNumber) at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:181 [3] llvm_compat at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:176 [inlined] [4] init_compatibility() at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:236 [5] runtime_init() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:121 [6] (::CUDA.var"#581#582"{Bool})() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:32 [7] lock(::CUDA.var"#581#582"{Bool}, ::ReentrantLock) at ./lock.jl:161 [8] _functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:26 [9] functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:19 [10] functional() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:18

To reproduce

The Minimal Working Example (MWE) for this bug:

pkg> generate test_project pkg> activate test_project julia> using CUDA

julia> CUDA.functional() ERROR: Your LLVM does not support the NVPTX back-end.

This is very strange; both the official binaries and an unmodified build should contain this back-end. Stacktrace: [1] error(::String) at ./error.jl:33 [2] llvm_compat(::VersionNumber) at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:181 [3] llvm_compat at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:176 [inlined] [4] init_compatibility() at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:236 [5] runtime_init() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:121 [6] (::CUDA.var"#581#582"{Bool})() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:32 [7] lock(::CUDA.var"#581#582"{Bool}, ::ReentrantLock) at ./lock.jl:161 [8] _functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:26 [9] functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:19 [10] functional() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:18

Expected behavior

It should work fine as the global shared environment...

Version info

Details on Julia:

please post the output of:

julia> versioninfo() Julia Version 1.5.0-beta1.0 Commit 6443f6c95a (2020-05-28 17:42 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: AMD Ryzen 9 3900X 12-Core Processor WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-9.0.1 (ORCJIT, znver1)

Details on CUDA:

please post the output of:

CUDA.versioninfo()

versioninfo doesn't exist in v1.0.2 ..., but I'm using CUDA v1.0.2 with cuda toolkit 11

this problem remains the same on latest master branch of CUDA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaGPU/CUDA.jl/issues/249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDO2W72U3BV5USMXFUVPDRYJKITANCNFSM4OHFLRVA .

Roger-luo commented 4 years ago

I installed julia via the official linux binary release by using https://github.com/johnnychen94/jill.py

maleadt commented 4 years ago

Could you try these in both sessions:

julia> using LLVM

julia> haskey(targets(), "nvptx")
false

julia> InitializeAllTargets()

julia> haskey(targets(), "nvptx")
true

julia> LLVM.libllvm_targets
18-element Array{Symbol,1}:
 :AArch64
 :AMDGPU
 :ARC
 :ARM
 :AVR
 :BPF
 :Hexagon
 :Lanai
 :MSP430
 :Mips
 :NVPTX
 :PowerPC
 :RISCV
 :Sparc
 :SystemZ
 :WebAssembly
 :X86
 :XCore

julia> LLVM.libllvm
:libLLVM

julia> using Libdl
L
julia> Libdl.dlpath(LLVM.libllvm)
"/home/tim/.cache/julia/binaries/1.5.0-beta1/x64/bin/../lib/julia/libLLVM-9jl.so"
Roger-luo commented 4 years ago

This is what I got:

In global environment

julia> using LLVM

julia> haskey(targets(), "nvptx")
false

julia> InitializeAllTargets()

julia> haskey(targets(), "nvptx")
false

julia> LLVM.libllvm_targets
18-element Array{Symbol,1}:
 :AArch64
 :AMDGPU
 :ARC
 :ARM
 :AVR
 :BPF
 :Hexagon
 :Lanai
 :MSP430
 :Mips
 :NVPTX
 :PowerPC
 :RISCV
 :Sparc
 :SystemZ
 :WebAssembly
 :X86
 :XCore

julia> LLVM.libllvm
:libLLVM

julia> using Libdl

julia> Libdl.dlpath(LLVM.libllvm)
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"

and in a local project environment

julia> using LLVM

julia> haskey(targets(), "nvptx")
false

julia> InitializeAllTargets()

julia> haskey(targets(), "nvptx")
false

julia> LLVM.libllvm_targets
18-element Array{Symbol,1}:
 :AArch64
 :AMDGPU
 :ARC
 :ARM
 :AVR
 :BPF
 :Hexagon
 :Lanai
 :MSP430
 :Mips
 :NVPTX
 :PowerPC
 :RISCV
 :Sparc
 :SystemZ
 :WebAssembly
 :X86
 :XCore

julia> LLVM.libllvm
:libLLVM

julia> using Libdl

julia> Libdl.dlpath(LLVM.libllvm)
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"

They looks the same...

Roger-luo commented 4 years ago

umm, somehow now both of my environment stop working and emit this error...

maleadt commented 4 years ago

Could you post the output of Libdl.dllist(), and try the following:

julia> using LLVM

julia> InitializeAllTargets()

julia> name.(collect(targets()))
8-element Array{String,1}:
 "wasm64"
 "wasm32"
 "amdgcn"
 "r600"
 "nvptx64"
 "nvptx"
 "x86-64"
 "x86"
maleadt commented 4 years ago

Also, "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so" points to libLLVM-9jl.so in the same directory, right?

Roger-luo commented 4 years ago
julia> using LLVM

julia> InitializeAllTargets()

julia> name.(collect(targets()))
String[]

julia> Libdl.dllist()
28-element Array{String,1}:
 "linux-vdso.so.1"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/libjulia.so.1"
 "/lib/x86_64-linux-gnu/libdl.so.2"
 "/lib/x86_64-linux-gnu/librt.so.1"
 "/lib/x86_64-linux-gnu/libpthread.so.0"
 "/lib/x86_64-linux-gnu/libc.so.6"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM-9jl.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libstdc++.so.6"
 "/lib/x86_64-linux-gnu/libm.so.6"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libgcc_s.so.1"
 "/lib64/ld-linux-x86-64.so.2"
 "/home/roger/packages/julias/julia-1.5/lib/julia/sys.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libpcre2-8.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libgmp.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libmpfr.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libopenblas64_.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libgfortran.so.4"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libquadmath.so.0"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libcholmod.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libamd.so.2"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libcolamd.so.2"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libsuitesparseconfig.so.5"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libccolamd.so.2"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libcamd.so.2"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libopenblas64_.so.0"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libsuitesparse_wrapper.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libsuitesparseconfig.so"
 "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"

Yes, libLLVM.so seems to be the same as libLLVM-9jl.so in the same directory.

Roger-luo commented 4 years ago

I just download a new binary of v1.4.2, same problem with v1.4.2 somehow (and same output as above). But this is just strange, it was working fine yesterday. And I don't think I changed anything... now CuArrays gives this error as well.

maleadt commented 4 years ago

It's probably caused by https://github.com/maleadt/LLVM.jl/pull/188, where we switched from ccalling with an absolute path to the library to ccall((fun, :libLLVM)), but I'm failing to see how that would cause everything to break since dlpath(libLLVM) points the absolute path we used previously.

That said, it's a bit strange that libLLVM is listed twice in Libdl.dllist, once with and once without the version suffix. Maybe that's a red herring.

maleadt commented 4 years ago

OH, InitializeAllTargets is part of the LLVM extras API, i.e. that which we access through libjulia! So the call to InitializeAllTargets initialized the targets in the LLVM library that was loaded by Julia, and not the unversioned one which here was loaded separately (libLLVM vs libLLVM-9jl). That also means https://github.com/maleadt/LLVM.jl/pull/188 is fundamentally invalid and will never be compatible with a non-Julia LLVM library. Still, I would have expected these library handles to alias here, since one is a symlink to the other.

maleadt commented 4 years ago

So summarizing: dlopen(:libLLVM) on @Roger-luo's system resolves to the libLLVM.so symlink, and not its libLLVM-9jl.so target. This causes ccall((fun, :libLLVM)) to end up in a different library than the calls to libjulia which internally call the libLLVM julia was linked to (which should really just be the same library, but apparently the linker can get confused here).

All this isn't the case on my system:

julia> using Libdl

julia> Libdl.dlpath(:libLLVM)
"/home/tim/.cache/julia/binaries/1.5.0-beta1/x64/bin/../lib/julia/libLLVM-9jl.so"

vs


julia> using Libdl

julia> Libdl.dlpath(LLVM.libllvm)
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"

@vchuravy @staticfloat Any thoughts?

songxianxu commented 4 years ago

I just download a new binary of v1.4.2, same problem with v1.4.2 somehow (and same output as above). But this is just strange, it was working fine yesterday. And I don't think I changed anything... now CuArrays gives this error as well.

I am not sure if this is related. I was using both CuArrays.jl and CUDA.jl (not simultaneously) and I have the same issue today with Julia 1.4.2. It was fine on Monday. I realized that if I downgrade my LLVM to v1.5.2 via add LLVM@1.5.2, things are resolved. Perhaps thew new PR with LLVM v1.6.0 has something to do with this? (I realized that it was releasaed 3 days ago). This seems to explain why it was fine for me a few days ago.

Updates: I just realized that this new PR is already mentioned above (maleadt/LLVM.jl#188). I can reproduce the above case for Libdl.dlpath(:libLLVM) by switching my the version of LLVM.jl

staticfloat commented 4 years ago

I'm a little unclear on what the issue is; are you saying that we're getting two identical copies of the same library loaded, depending on what path you access it from?

maleadt commented 4 years ago

Yes, and the Libdl.dllist() above seems to confirm that, listing both libLLVM.so and libLLVM-9jl.so, even though the former should be a symlink to the latter.

maleadt commented 4 years ago

@Roger-luo @songxianxu Which Linux distributions are you using?

Roger-luo commented 4 years ago

I'm using Ubuntu 20.04.

julia> versioninfo(verbose=true)
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04 LTS
  uname: Linux 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64
  CPU: AMD Ryzen 9 3900X 12-Core Processor: 
                 speed         user         nice          sys         idle          irq
       #1-24  2197 MHz      63922 s       4184 s      22469 s   40076548 s          0 s

  Memory: 31.34796905517578 GB (24781.34375 MB free)
  Uptime: 16742.0 sec
  Load Avg:  0.396484375  0.14453125  0.046875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)
Environment:
  DEFAULTS_PATH = /usr/share/gconf/ubuntu.default.path
  HOME = /home/roger
  WINDOWPATH = 2
  MANDATORY_PATH = /usr/share/gconf/ubuntu.mandatory.path
  PATH = /usr/local/cuda-11.0/bin/:/home/roger/.local/bin:/home/roger/miniconda3/bin:/home/roger/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
  TERM = xterm-256color
Roger-luo commented 4 years ago

@maleadt you are right, I think this is caused by that PR indeed - this explains why it stop working in my local environment first then affect the global shared environment afterwards, it simply because I updated my dependencies lol.

By downgrade LLVM.jl to previous version (v1.5.2), CUDA.jl works now.

songxianxu commented 4 years ago

I was using both Manjaro and Ubuntu.

@Roger-luo Are you doing all the test via distributions installed by jill?

I just realized that if I have fresh copy of 1.4.2 download from the official site, I cannot reproduce this problem anymore with the same environments. It might be a problem of the unpacking due to jill.

Updates: It seems that if you installed via jill, libLLVM.so is no longer a symbolic link.

[phyxxs@espresso julia]$ ls -li libLLVM.so 
9700755 lrwxrwxrwx 1 phyxxs phyxxs 14 May 24 03:13 libLLVM.so -> libLLVM-8jl.so
# Below is installed by jill
[phyxxs@espresso julia]$ ls -li ~/packages/julias/julia-1.4/lib/julia/libLLVM.so 
525875 -rwxr-xr-x 1 phyxxs phyxxs 56941776 May 24 03:13 /home/phyxxs/packages/julias/julia-1.4/lib/julia/libLLVM.so
staticfloat commented 4 years ago

I can't reproduce this on my machine:

julia> using Libdl, LLVM

(@v1.6) pkg> st
Status `~/.julia/environments/v1.6/Project.toml`
  [929cbde3] LLVM v1.6.0

julia> Libdl.dlpath(:libLLVM)
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"

julia> Libdl.dlpath(LLVM.libllvm)
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"

julia> run(`/bin/bash -c "ls -la /home/sabae/local/dist/julia-master/bin/../lib/julia/*LLVM*"`)
-rwxr-xr-x 1 sabae sabae 62935480 Jun 25 19:42 /home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so
lrwxrwxrwx 1 sabae sabae       14 Jun 25 19:42 /home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM.so -> libLLVM-9jl.so

julia> dlpath(Libdl.dlopen("/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM.so"))
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"

julia> filter(f -> occursin("LLVM", f), Libdl.dllist())
1-element Array{String,1}:
 "/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"

This is on an Ubuntu machine, and happens both with the latest master and with the official 1.4.2 binaries. (Although with 1.4.2, it loads LLVM 8, not LLVM 9, of course).

staticfloat commented 4 years ago

It seems that if you installed via jill, libLLVM.so is no longer a symbolic link.

That makes perfect sense. That would cause this problem.

Roger-luo commented 4 years ago

Ah I could confirm that the libLLVM.so is no longer a symlink but the same copy of libLLVM-9jl.so. This is strange, what jill does should only be downloading the official release and unpacking it with tar. @johnnychen94 any thoughts?

johnnychen94 commented 4 years ago

It seems that if you installed via jill, libLLVM.so is no longer a symbolic link.

Sorry for the trouble, I didn't know that these libs should be symlinks when I wrote jill.

The symlink issue should be fixed in https://github.com/johnnychen94/jill.py/commit/f4f7edc4c94838f6a525919fbaf14b2ce0d64df9 and jill v0.6.14 (will make a release once the CI passes)

Roger-luo commented 4 years ago

I'll close this issue here then. Thanks for the quick fix @johnnychen94

staticfloat commented 4 years ago

Quick resolution all around! Great work, everybody!