Open Quuxplusone opened 8 years ago
Bugzilla Link | PR30587 |
Status | NEW |
Importance | P normal |
Reported by | Heinz Wiesinger (pprkut@slackware.com) |
Reported on | 2016-10-01 10:59:33 -0700 |
Last modified on | 2018-09-14 06:56:19 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | anastasia.stulova@arm.com, evangelos@foutrelis.com, giuseppe.bilotta@gmail.com, josh@joshmatthews.net, llvm-bugs@lists.llvm.org, pekka.jaaskelainen@tuni.fi, virtuousfox@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
I am guessing you don't actually invoke clang from the command line here? Or if yes, it might be useful to see the command line that triggers this error.
I am wondering whether you should address this request with the vendors first.
Indeed, there is no actual clang command line invocation here. Afaics all opencl libs I tested link a static clang library and that is where the problem comes from. Not sure which one it is though.
At the moment there is not enough information to understand the problem. It is also not clear whether it is a problem with the compiler or the use of it. Could you provide more details what you do? It might though be easier to address it with the vendors/toochains you use instead.
Trying my best here to provide as much information as possible, but I'm not
really sure what I'm looking for, so if you need anything specific, please let
me know!
ocl-icd acts as a sort wrapper around different opencl implementation, so you
can have multiple implementations installed and the one best matching your
system will be picked at runtime. This is an important concept for linux
distros who want to ship for a broad range of hardware.
Now, I'll try to explain what happens on the example of beignet, the opencl
implementation for Intel GPUs. beignet ships a library called libgbe.so. The
link command of this library looks like this (sorry, long):
===================================
/usr/libexec/icecc/bin/c++ -fPIC -O2 -fPIC -funroll-loops -fstrict-aliasing -
msse2 -msse3 -mssse3 -msse4.1 -fPIC -Wall -mfpmath=sse -Wcast-align -Wl,-E -
std=c++0x -Wno-invalid-offsetof -fno-rtti -I/usr/include -D_GNU_SOURCE -
D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -
DGBE_DEBUG_MEMORY=0 -DGBE_COMPILER_AVAILABLE=1 -fvisibility=hidden -O2 -
DNDEBUG -DGBE_DEBUG=0 -Wl,-Bsymbolic -Wl,--no-undefined -L/usr/lib64 -shared -
Wl,-soname,libgbe.so -o libgbe.so CMakeFiles/gbe.dir/sys/intrusive_list.cpp.o
CMakeFiles/gbe.dir/sys/assert.cpp.o CMakeFiles/gbe.dir/sys/alloc.cpp.o
CMakeFiles/gbe.dir/sys/mutex.cpp.o CMakeFiles/gbe.dir/sys/platform.cpp.o
CMakeFiles/gbe.dir/sys/cvar.cpp.o CMakeFiles/gbe.dir/ir/context.cpp.o
CMakeFiles/gbe.dir/ir/profile.cpp.o CMakeFiles/gbe.dir/ir/type.cpp.o
CMakeFiles/gbe.dir/ir/unit.cpp.o CMakeFiles/gbe.dir/ir/constant.cpp.o
CMakeFiles/gbe.dir/ir/sampler.cpp.o CMakeFiles/gbe.dir/ir/image.cpp.o
CMakeFiles/gbe.dir/ir/half.cpp.o CMakeFiles/gbe.dir/ir/instruction.cpp.o
CMakeFiles/gbe.dir/ir/liveness.cpp.o CMakeFiles/gbe.dir/ir/register.cpp.o
CMakeFiles/gbe.dir/ir/function.cpp.o CMakeFiles/gbe.dir/ir/value.cpp.o
CMakeFiles/gbe.dir/ir/lowering.cpp.o CMakeFiles/gbe.dir/ir/profiling.cpp.o
CMakeFiles/gbe.dir/ir/printf.cpp.o CMakeFiles/gbe.dir/ir/immediate.cpp.o
CMakeFiles/gbe.dir/ir/structurizer.cpp.o CMakeFiles/gbe.dir/ir/reloc.cpp.o
CMakeFiles/gbe.dir/backend/context.cpp.o
CMakeFiles/gbe.dir/backend/program.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_sampler_fix.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_bitcode_link.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_gen_backend.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_passes.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_scalarize.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_intrinsic_lowering.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_barrier_nodup.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_printf_parser.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_profiling.cpp.o
CMakeFiles/gbe.dir/llvm/ExpandConstantExpr.cpp.o
CMakeFiles/gbe.dir/llvm/ExpandUtils.cpp.o
CMakeFiles/gbe.dir/llvm/PromoteIntegers.cpp.o
CMakeFiles/gbe.dir/llvm/ExpandLargeIntegers.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_device_enqueue.cpp.o
CMakeFiles/gbe.dir/llvm/StripAttributes.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_to_gen.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_loadstore_optimization.cpp.o
CMakeFiles/gbe.dir/llvm/llvm_unroll.cpp.o
CMakeFiles/gbe.dir/backend/gen/gen_mesa_disasm.c.o
CMakeFiles/gbe.dir/backend/gen_insn_selection.cpp.o
CMakeFiles/gbe.dir/backend/gen_insn_selection_optimize.cpp.o
CMakeFiles/gbe.dir/backend/gen_insn_scheduling.cpp.o
CMakeFiles/gbe.dir/backend/gen_insn_selection_output.cpp.o
CMakeFiles/gbe.dir/backend/gen_reg_allocation.cpp.o
CMakeFiles/gbe.dir/backend/gen_context.cpp.o
CMakeFiles/gbe.dir/backend/gen75_context.cpp.o
CMakeFiles/gbe.dir/backend/gen8_context.cpp.o
CMakeFiles/gbe.dir/backend/gen9_context.cpp.o
CMakeFiles/gbe.dir/backend/gen_program.cpp.o
CMakeFiles/gbe.dir/backend/gen_insn_compact.cpp.o
CMakeFiles/gbe.dir/backend/gen_encoder.cpp.o
CMakeFiles/gbe.dir/backend/gen7_encoder.cpp.o
CMakeFiles/gbe.dir/backend/gen75_encoder.cpp.o
CMakeFiles/gbe.dir/backend/gen8_encoder.cpp.o
CMakeFiles/gbe.dir/backend/gen9_encoder.cpp.o -ldrm_intel -ldrm -ldrm -Wl,-
Bstatic -lclangFrontend -lclangSerialization -lclangDriver -lclangCodeGen -
lclangSema -lclangStaticAnalyzerFrontend -lclangStaticAnalyzerCheckers -
lclangStaticAnalyzerCore -lclangAnalysis -lclangEdit -lclangAST -lclangParse -
lclangSema -lclangLex -lclangBasic -Wl,-Bdynamic -lLLVM-3.9 -lrt -ldl -ltinfo -
lpthread -lz -lm -lpthread -ldl -Wl,-Bstatic -lclangStaticAnalyzerFrontend -
lclangStaticAnalyzerCheckers -lclangStaticAnalyzerCore -lclangAnalysis -
lclangEdit -lclangAST -lclangParse -lclangLex -lclangBasic -Wl,-Bdynamic -lLLVM-
3.9 -lrt -ldl -ltinfo -lpthread -lz -lm -lpthread -ldl
===================================
The important part is the list of clang libraries linked, which are all static
libs. Checking those libs with "strings" for "enable-value-profiling" reveals
that the option is defined in libclangCodeGen.a.
Now, if you load two opencl implementations (ocl-icd loads all available ones)
which both linked libclangCodeGen.a, you get the error as I described in my
inital post.
I really don't see how the vendors could fix this, other than not linking
libclangCodeGen.a. But if you have suggestions I'd be happy to open bugs with
them.
Since the failing option comes from the CodeGen lib I classified the component accordingly.
As I can see that option has been added more than a year ago. It uses a standard way for internal options.
I am still not convinced that the issue is not in the wrong use of Clang libraries. I believe that there are multiple of the same libraries instances that register the option in the same address space. And therefore the error is reported. I don't think this is something that should happen though. Particularly when you mean loading multiple OpenCL implementations, does this imply multiple instances of the same Clang libraries linked together? It might be that it has worked before just by chance and the issue started to be exposed after the that CodeGen option was added. But I would wait for the final assessment from whoever knows more about that flag or the use of multiple Clang/LLVM libs together.
The command line option error is the classical problem one gets due to the LLVM's global object-based command line switch registration. If you somehow link the same global object (that registers the command line option) twice via dynamic loading of the LLVM library twice or more to the same process, it registers the same command line object again (when calling the global object initializer), resulting in this error.
I think here it happens via ICD: It loads the library a) which gets the LLVM's command line switch object registered first, then it loads library b) which also uses the same LLVM and has the same command line global object linked in statically.
I have typically preferred linking to LLVM lib dynamically due to this issue among others. Then you get the switch object linked in only once per process thanks to the dynamic loader detecting the LLVM lib has already been loaded by a) when b) requests it.
This is not a bug in LLVM as such, unless one considers the global object based command line switch registration as such. This error would probably just go away if the command line handler just ignored multiple identical command line switch registrations silently.
I'm not sure this is only an issue of mixed dynamic and static linking. I am seeing this issue even with OpenCL ICDs that link dynamically to the same LLVM version. I'm on Debian unstable and I'm using the distribution LLVM 4.0 development packages. I compile both pocl and beignet specifying LLVM 4.0, and verify that they are using the same library with ldd (which shows libLLVM-4.0.so.1
for both libgbe and pocl). Trying to run clinfo
fails with the error
: CommandLine Error: Option 'enable-value-profiling' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
This is a pretty severe issue for any distribution shipping multiple OpenCL ICDs, as most FLOSS OpenCL platforms depend on LLVM and Clang for their OpenCL support. For example, I have Mesa (distro), Beignet and POCL (plus a few non-free ones).
Up until LLVM 3.7, and possibly 3.8, it used to be that as along as all platforms linked (dynamically) to the same LLVM version, then they could co-exist. This is not the case anymore. Hence, this is a regression that effectively prevents dynamic linking for ICDs (or any similar plugin system).
Debian maintainers plan on statically linking LLVM to avoid this issue, but at this point there's no guarantee that this will be sufficient, and honestly, having three statically linked copies of LLVM just to work around this issue is a bit excessive.
This seems to be related to https://bugs.llvm.org/show_bug.cgi?id=22952 —possible blocker?
An additional piece of information: the situation now seems to be the reverse of what it used to be up to LLVM 3.8
It used to be that for multiple OpenCL ICD depending on LLVM to work, they had to dynamically link to the same version of the library, or a number of context-related functions would fail.
Now, if they all depend on the same LLVM version, the mentioned multiple-registration error aborts any program trying to use OpenCL, but if each ICD links to a different version, it works.
I currently have Debian sid's Mesa ICD, which uses version 5. If I build POCL using version 4 (for example) and Beignet with version 3.8 (for example), it works. If they are all built with version 5, it the infamous error about multiple registered options appears.
I will try to re-classify this bug hoping it will get the attention of the right people.
It feels something fundamental in LLVM design. Perhaps it would make sense to start a thread on llvm-dev. Please CC me if you do!
(In reply to Giuseppe Bilotta from comment #8)
> This seems to be related to https://bugs.llvm.org/show_bug.cgi?id=22952
> —possible blocker?
In #22952 an option to somehow use LLVM_DYLIB_COMPONENTS is mentioned. Is there
a known way to workaround this issue with it ?
For all GNU/Linux distributions to not be able to provide universal hardware
support (by aforementioned Mesa/Clover for AMD GPUs, Beignet for Intel GPUs and
POCL for CPUs) on x86 because of this is quite ridiculous. Saying that,
shouldn't staff from AMD and Intel be heavily involved in fixing this ?
People at
https://cgit.freedesktop.org/mesa/mesa/log/src/gallium/state_trackers/clover/llvm
https://cgit.freedesktop.org/beignet/log/backend/src/llvm
https://github.com/pocl/pocl/commits/master/lib/llvmopencl
Mainly
Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle <nicolai.haehnle@amd.com>
Jan Vesely <jan.vesely@rutgers.edu>
Yang Rong <rong.r.yang@intel.com>
and Pekka Jääskeläinen here.