Closed Roger-luo closed 4 years ago
How did you build/obtain Julia?
On Wed, Jun 24, 2020, 15:50 Rogerluo notifications@github.com wrote:
I'm getting the following error on julia-1.5-beta1
In the default shared environment, this is fine. However, if I start Julia with julia --project, this somehow gives me the following error
julia> using CUDA
julia> CUDA.functional() ERROR: Your LLVM does not support the NVPTX back-end.
This is very strange; both the official binaries and an unmodified build should contain this back-end. Stacktrace: [1] error(::String) at ./error.jl:33 [2] llvm_compat(::VersionNumber) at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:181 [3] llvm_compat at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:176 [inlined] [4] init_compatibility() at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:236 [5] runtime_init() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:121 [6] (::CUDA.var"#581#582"{Bool})() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:32 [7] lock(::CUDA.var"#581#582"{Bool}, ::ReentrantLock) at ./lock.jl:161 [8] _functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:26 [9] functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:19 [10] functional() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:18
To reproduce
The Minimal Working Example (MWE) for this bug:
pkg> generate test_project pkg> activate test_project julia> using CUDA
julia> CUDA.functional() ERROR: Your LLVM does not support the NVPTX back-end.
This is very strange; both the official binaries and an unmodified build should contain this back-end. Stacktrace: [1] error(::String) at ./error.jl:33 [2] llvm_compat(::VersionNumber) at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:181 [3] llvm_compat at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:176 [inlined] [4] init_compatibility() at /home/roger/.julia/packages/CUDA/42B9G/deps/compatibility.jl:236 [5] runtime_init() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:121 [6] (::CUDA.var"#581#582"{Bool})() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:32 [7] lock(::CUDA.var"#581#582"{Bool}, ::ReentrantLock) at ./lock.jl:161 [8] _functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:26 [9] functional(::Bool) at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:19 [10] functional() at /home/roger/.julia/packages/CUDA/42B9G/src/initialization.jl:18
Expected behavior
It should work fine as the global shared environment...
Version info
Details on Julia:
please post the output of:
julia> versioninfo() Julia Version 1.5.0-beta1.0 Commit 6443f6c95a (2020-05-28 17:42 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: AMD Ryzen 9 3900X 12-Core Processor WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-9.0.1 (ORCJIT, znver1)
Details on CUDA:
please post the output of:
CUDA.versioninfo()
versioninfo doesn't exist in v1.0.2 ..., but I'm using CUDA v1.0.2 with cuda toolkit 11
this problem remains the same on latest master branch of CUDA.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaGPU/CUDA.jl/issues/249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDO2W72U3BV5USMXFUVPDRYJKITANCNFSM4OHFLRVA .
I installed julia via the official linux binary release by using https://github.com/johnnychen94/jill.py
Could you try these in both sessions:
julia> using LLVM
julia> haskey(targets(), "nvptx")
false
julia> InitializeAllTargets()
julia> haskey(targets(), "nvptx")
true
julia> LLVM.libllvm_targets
18-element Array{Symbol,1}:
:AArch64
:AMDGPU
:ARC
:ARM
:AVR
:BPF
:Hexagon
:Lanai
:MSP430
:Mips
:NVPTX
:PowerPC
:RISCV
:Sparc
:SystemZ
:WebAssembly
:X86
:XCore
julia> LLVM.libllvm
:libLLVM
julia> using Libdl
L
julia> Libdl.dlpath(LLVM.libllvm)
"/home/tim/.cache/julia/binaries/1.5.0-beta1/x64/bin/../lib/julia/libLLVM-9jl.so"
This is what I got:
In global environment
julia> using LLVM
julia> haskey(targets(), "nvptx")
false
julia> InitializeAllTargets()
julia> haskey(targets(), "nvptx")
false
julia> LLVM.libllvm_targets
18-element Array{Symbol,1}:
:AArch64
:AMDGPU
:ARC
:ARM
:AVR
:BPF
:Hexagon
:Lanai
:MSP430
:Mips
:NVPTX
:PowerPC
:RISCV
:Sparc
:SystemZ
:WebAssembly
:X86
:XCore
julia> LLVM.libllvm
:libLLVM
julia> using Libdl
julia> Libdl.dlpath(LLVM.libllvm)
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"
and in a local project environment
julia> using LLVM
julia> haskey(targets(), "nvptx")
false
julia> InitializeAllTargets()
julia> haskey(targets(), "nvptx")
false
julia> LLVM.libllvm_targets
18-element Array{Symbol,1}:
:AArch64
:AMDGPU
:ARC
:ARM
:AVR
:BPF
:Hexagon
:Lanai
:MSP430
:Mips
:NVPTX
:PowerPC
:RISCV
:Sparc
:SystemZ
:WebAssembly
:X86
:XCore
julia> LLVM.libllvm
:libLLVM
julia> using Libdl
julia> Libdl.dlpath(LLVM.libllvm)
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"
They looks the same...
umm, somehow now both of my environment stop working and emit this error...
Could you post the output of Libdl.dllist()
, and try the following:
julia> using LLVM
julia> InitializeAllTargets()
julia> name.(collect(targets()))
8-element Array{String,1}:
"wasm64"
"wasm32"
"amdgcn"
"r600"
"nvptx64"
"nvptx"
"x86-64"
"x86"
Also, "/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"
points to libLLVM-9jl.so
in the same directory, right?
julia> using LLVM
julia> InitializeAllTargets()
julia> name.(collect(targets()))
String[]
julia> Libdl.dllist()
28-element Array{String,1}:
"linux-vdso.so.1"
"/home/roger/packages/julias/julia-1.5/bin/../lib/libjulia.so.1"
"/lib/x86_64-linux-gnu/libdl.so.2"
"/lib/x86_64-linux-gnu/librt.so.1"
"/lib/x86_64-linux-gnu/libpthread.so.0"
"/lib/x86_64-linux-gnu/libc.so.6"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM-9jl.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libstdc++.so.6"
"/lib/x86_64-linux-gnu/libm.so.6"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libgcc_s.so.1"
"/lib64/ld-linux-x86-64.so.2"
"/home/roger/packages/julias/julia-1.5/lib/julia/sys.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libpcre2-8.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libgmp.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libmpfr.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libopenblas64_.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libgfortran.so.4"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libquadmath.so.0"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libcholmod.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libamd.so.2"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libcolamd.so.2"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libsuitesparseconfig.so.5"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libccolamd.so.2"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libcamd.so.2"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libopenblas64_.so.0"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libsuitesparse_wrapper.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libsuitesparseconfig.so"
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"
Yes, libLLVM.so
seems to be the same as libLLVM-9jl.so
in the same directory.
I just download a new binary of v1.4.2, same problem with v1.4.2 somehow (and same output as above). But this is just strange, it was working fine yesterday. And I don't think I changed anything... now CuArrays
gives this error as well.
It's probably caused by https://github.com/maleadt/LLVM.jl/pull/188, where we switched from ccall
ing with an absolute path to the library to ccall((fun, :libLLVM))
, but I'm failing to see how that would cause everything to break since dlpath(libLLVM)
points the absolute path we used previously.
That said, it's a bit strange that libLLVM
is listed twice in Libdl.dllist
, once with and once without the version suffix. Maybe that's a red herring.
OH, InitializeAllTargets
is part of the LLVM extras API, i.e. that which we access through libjulia
! So the call to InitializeAllTargets
initialized the targets in the LLVM library that was loaded by Julia, and not the unversioned one which here was loaded separately (libLLVM
vs libLLVM-9jl
). That also means https://github.com/maleadt/LLVM.jl/pull/188 is fundamentally invalid and will never be compatible with a non-Julia LLVM library. Still, I would have expected these library handles to alias here, since one is a symlink to the other.
So summarizing: dlopen(:libLLVM)
on @Roger-luo's system resolves to the libLLVM.so
symlink, and not its libLLVM-9jl.so
target. This causes ccall((fun, :libLLVM))
to end up in a different library than the calls to libjulia
which internally call the libLLVM
julia was linked to (which should really just be the same library, but apparently the linker can get confused here).
All this isn't the case on my system:
julia> using Libdl
julia> Libdl.dlpath(:libLLVM)
"/home/tim/.cache/julia/binaries/1.5.0-beta1/x64/bin/../lib/julia/libLLVM-9jl.so"
vs
julia> using Libdl
julia> Libdl.dlpath(LLVM.libllvm)
"/home/roger/packages/julias/julia-1.5/bin/../lib/julia/libLLVM.so"
@vchuravy @staticfloat Any thoughts?
I just download a new binary of v1.4.2, same problem with v1.4.2 somehow (and same output as above). But this is just strange, it was working fine yesterday. And I don't think I changed anything... now
CuArrays
gives this error as well.
I am not sure if this is related. I was using both CuArrays.jl and CUDA.jl (not simultaneously) and I have the same issue today with Julia 1.4.2. It was fine on Monday. I realized that if I downgrade my LLVM to v1.5.2 via add LLVM@1.5.2, things are resolved. Perhaps thew new PR with LLVM v1.6.0 has something to do with this? (I realized that it was releasaed 3 days ago). This seems to explain why it was fine for me a few days ago.
Updates:
I just realized that this new PR is already mentioned above (maleadt/LLVM.jl#188). I can reproduce the above case for Libdl.dlpath(:libLLVM)
by switching my the version of LLVM.jl
I'm a little unclear on what the issue is; are you saying that we're getting two identical copies of the same library loaded, depending on what path you access it from?
Yes, and the Libdl.dllist()
above seems to confirm that, listing both libLLVM.so
and libLLVM-9jl.so
, even though the former should be a symlink to the latter.
@Roger-luo @songxianxu Which Linux distributions are you using?
I'm using Ubuntu 20.04.
julia> versioninfo(verbose=true)
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
Ubuntu 20.04 LTS
uname: Linux 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64
CPU: AMD Ryzen 9 3900X 12-Core Processor:
speed user nice sys idle irq
#1-24 2197 MHz 63922 s 4184 s 22469 s 40076548 s 0 s
Memory: 31.34796905517578 GB (24781.34375 MB free)
Uptime: 16742.0 sec
Load Avg: 0.396484375 0.14453125 0.046875
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, znver1)
Environment:
DEFAULTS_PATH = /usr/share/gconf/ubuntu.default.path
HOME = /home/roger
WINDOWPATH = 2
MANDATORY_PATH = /usr/share/gconf/ubuntu.mandatory.path
PATH = /usr/local/cuda-11.0/bin/:/home/roger/.local/bin:/home/roger/miniconda3/bin:/home/roger/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
TERM = xterm-256color
@maleadt you are right, I think this is caused by that PR indeed - this explains why it stop working in my local environment first then affect the global shared environment afterwards, it simply because I updated my dependencies lol.
By downgrade LLVM.jl to previous version (v1.5.2), CUDA.jl works now.
I was using both Manjaro and Ubuntu.
@Roger-luo Are you doing all the test via distributions installed by jill?
I just realized that if I have fresh copy of 1.4.2 download from the official site, I cannot reproduce this problem anymore with the same environments. It might be a problem of the unpacking due to jill.
Updates:
It seems that if you installed via jill
, libLLVM.so is no longer a symbolic link.
[phyxxs@espresso julia]$ ls -li libLLVM.so
9700755 lrwxrwxrwx 1 phyxxs phyxxs 14 May 24 03:13 libLLVM.so -> libLLVM-8jl.so
# Below is installed by jill
[phyxxs@espresso julia]$ ls -li ~/packages/julias/julia-1.4/lib/julia/libLLVM.so
525875 -rwxr-xr-x 1 phyxxs phyxxs 56941776 May 24 03:13 /home/phyxxs/packages/julias/julia-1.4/lib/julia/libLLVM.so
I can't reproduce this on my machine:
julia> using Libdl, LLVM
(@v1.6) pkg> st
Status `~/.julia/environments/v1.6/Project.toml`
[929cbde3] LLVM v1.6.0
julia> Libdl.dlpath(:libLLVM)
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"
julia> Libdl.dlpath(LLVM.libllvm)
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"
julia> run(`/bin/bash -c "ls -la /home/sabae/local/dist/julia-master/bin/../lib/julia/*LLVM*"`)
-rwxr-xr-x 1 sabae sabae 62935480 Jun 25 19:42 /home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so
lrwxrwxrwx 1 sabae sabae 14 Jun 25 19:42 /home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM.so -> libLLVM-9jl.so
julia> dlpath(Libdl.dlopen("/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM.so"))
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"
julia> filter(f -> occursin("LLVM", f), Libdl.dllist())
1-element Array{String,1}:
"/home/sabae/local/dist/julia-master/bin/../lib/julia/libLLVM-9jl.so"
This is on an Ubuntu machine, and happens both with the latest master
and with the official 1.4.2 binaries. (Although with 1.4.2, it loads LLVM 8, not LLVM 9, of course).
It seems that if you installed via jill, libLLVM.so is no longer a symbolic link.
That makes perfect sense. That would cause this problem.
Ah I could confirm that the libLLVM.so
is no longer a symlink but the same copy of libLLVM-9jl.so
. This is strange, what jill
does should only be downloading the official release and unpacking it with tar
. @johnnychen94 any thoughts?
It seems that if you installed via jill, libLLVM.so is no longer a symbolic link.
Sorry for the trouble, I didn't know that these libs should be symlinks when I wrote jill.
The symlink issue should be fixed in https://github.com/johnnychen94/jill.py/commit/f4f7edc4c94838f6a525919fbaf14b2ce0d64df9 and jill v0.6.14 (will make a release once the CI passes)
I'll close this issue here then. Thanks for the quick fix @johnnychen94
Quick resolution all around! Great work, everybody!
I'm getting the following error on julia-1.5-beta1
In the default shared environment, this is fine. However, if I start Julia with
julia --project
, this somehow gives me the following errorTo reproduce
The Minimal Working Example (MWE) for this bug:
Expected behavior
It should work fine as the global shared environment...
Version info
Details on Julia:
Details on CUDA:
versioninfo
doesn't exist in v1.0.2 ..., ~but I'm using CUDA v1.0.2 with cuda toolkit 11~but I manage to print this on master branch in the global environment (it will error in local project environment)
this problem remains the same on latest master branch of CUDA.