ROCm / ROCm-CompilerSupport

The compiler support repository provides various Lightning Compiler related services.
47 stars 31 forks source link

comgr-objdump.cpp causes crash if loaded alongside mesa #52

Closed xytovl closed 1 year ago

xytovl commented 1 year ago

When using both libamd_comgr.so and mesa with libvulkan_radeon, application (blender-3.3 in my case) crashes with

mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Abandon (core dumped)

the following patch fixes this for me:

diff -ru ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp
--- ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp    2022-07-28 17:06:14.000000000 +0200
+++ ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp 2022-12-20 20:03:53.357398486 +0100
@@ -175,9 +175,6 @@
 static cl::alias SectionHeadersShort("headers",
                                      cl::desc("Alias for --section-headers"),
                                      cl::aliasopt(SectionHeaders));
-static cl::alias SectionHeadersShorter("h",
-                                       cl::desc("Alias for --section-headers"),
-                                       cl::aliasopt(SectionHeaders));

 cl::list<std::string>
     FilterSections("section",

If I understand correctly, none of the command line options and aliases defined here are used, but this one alone fixes the issue.

dbenedb commented 1 year ago

Here is Blender 3.4 bug report with more details about the same issue:

https://developer.blender.org/T102018

cgmb commented 1 year ago

I still don't understand why either mesa or comgr register these command line options. Does anyone know what their purpose is?

xytovl commented 1 year ago

comgr-objdump seems to be a derivate from an old version of https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-objdump/llvm-objdump.cpp, the original file is intended as a command-line utility and it makes sense to define command-line options.

The conflicting definition seems to be in https://github.com/llvm/llvm-project/blob/239a174d92dd2bd99ecb1308dc8937040895b04d/llvm/lib/Support/CommandLine.cpp#L2618 Where "help" and "h" are defined, I think this line should also not end up in llvm as a library but this is probably a larger change as it could probably break "help" for llvm command line tools.

kzhuravl commented 1 year ago

cc @lamb-j

lamb-j commented 1 year ago

Ideally we could have two separate threads/components both manage their own options without clashing, but this is a known shortcoming of the way LLVM options are currently implemented. We have ambitions to work on this in the long term, but it requires big upstream LLVM changes.

In the short term, I think it may be safe to remove some alias options from comgr-objdump that have a higher probability of clashing with other libraries (like "-h" here). @kzhuravl Maybe we could remove all of the short/shorter options (-d, -p, -h, -s, etc.) and document that comgr-objdump requires full-length option names?

FYI these options can be set by a user via amd_comgr_action_info_set_option_list(...) before calling something like amd_comgr_do_action(AMD_COMGR_ACTION_DISASSEMBLE_RELOCATABLE_TO_SOURCE, ...)

RadoslavL commented 1 year ago

How did you install the patch? I tried to do the same thing, but got these error messages:

 * Applying user patches from /etc/portage/patches ...
 * Applying llvm-fix.patch ...
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff -ru ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp
|--- ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp   2022-07-28 17:06:14.000000000 +0200
|+++ ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp    2022-12-20 20:03:53.357398486 +0100
--------------------------
No file to patch.  Skipping patch.
1 out of 1 hunk ignored                                                                                                                                                                                      [ !! ]
 * ERROR: dev-libs/rocm-comgr-5.3.3::gentoo failed (prepare phase):
 *   patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch
 * 
 * Call stack:
 *               ebuild.sh, line  136:  Called src_prepare
 *             environment, line 2332:  Called cmake_src_prepare
 *             environment, line 1067:  Called default_src_prepare
 *      phase-functions.sh, line  948:  Called __eapi8_src_prepare
 *             environment, line  323:  Called eapply_user
 *             environment, line 1260:  Called eapply '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1223:  Called _eapply_patch '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1161:  Called __helpers_die 'patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *   isolated-functions.sh, line  112:  Called die
 * The specific snippet of code:
 *          die "$@"
 * 
 * If you need support, post the output of `emerge --info '=dev-libs/rocm-comgr-5.3.3::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-libs/rocm-comgr-5.3.3::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'
 * S: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'

>>> Failed to emerge dev-libs/rocm-comgr-5.3.3, Log file:

>>>  '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/build.log'

 * Messages for package dev-libs/rocm-comgr-5.3.3:

 * ERROR: dev-libs/rocm-comgr-5.3.3::gentoo failed (prepare phase):
 *   patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch
 * 
 * Call stack:
 *               ebuild.sh, line  136:  Called src_prepare
 *             environment, line 2332:  Called cmake_src_prepare
 *             environment, line 1067:  Called default_src_prepare
 *      phase-functions.sh, line  948:  Called __eapi8_src_prepare
 *             environment, line  323:  Called eapply_user
 *             environment, line 1260:  Called eapply '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1223:  Called _eapply_patch '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1161:  Called __helpers_die 'patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *   isolated-functions.sh, line  112:  Called die
 * The specific snippet of code:
 *          die "$@"
 * 
 * If you need support, post the output of `emerge --info '=dev-libs/rocm-comgr-5.3.3::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-libs/rocm-comgr-5.3.3::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'
 * S: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'
xytovl commented 1 year ago

How did you install the patch? I tried to do the same thing, but got these error messages:

 * Applying user patches from /etc/portage/patches ...
 * Applying llvm-fix.patch ...
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff -ru ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp
|--- ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp 2022-07-28 17:06:14.000000000 +0200
|+++ ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp  2022-12-20 20:03:53.357398486 +0100
--------------------------
No file to patch.  Skipping patch.
1 out of 1 hunk ignored                                                                                                                                                                                      [ !! ]
 * ERROR: dev-libs/rocm-comgr-5.3.3::gentoo failed (prepare phase):
 *   patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch
 * 
 * Call stack:
 *               ebuild.sh, line  136:  Called src_prepare
 *             environment, line 2332:  Called cmake_src_prepare
 *             environment, line 1067:  Called default_src_prepare
 *      phase-functions.sh, line  948:  Called __eapi8_src_prepare
 *             environment, line  323:  Called eapply_user
 *             environment, line 1260:  Called eapply '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1223:  Called _eapply_patch '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1161:  Called __helpers_die 'patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *   isolated-functions.sh, line  112:  Called die
 * The specific snippet of code:
 *        die "$@"
 * 
 * If you need support, post the output of `emerge --info '=dev-libs/rocm-comgr-5.3.3::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-libs/rocm-comgr-5.3.3::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'
 * S: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'

>>> Failed to emerge dev-libs/rocm-comgr-5.3.3, Log file:

>>>  '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/build.log'

 * Messages for package dev-libs/rocm-comgr-5.3.3:

 * ERROR: dev-libs/rocm-comgr-5.3.3::gentoo failed (prepare phase):
 *   patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch
 * 
 * Call stack:
 *               ebuild.sh, line  136:  Called src_prepare
 *             environment, line 2332:  Called cmake_src_prepare
 *             environment, line 1067:  Called default_src_prepare
 *      phase-functions.sh, line  948:  Called __eapi8_src_prepare
 *             environment, line  323:  Called eapply_user
 *             environment, line 1260:  Called eapply '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1223:  Called _eapply_patch '/etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *             environment, line 1161:  Called __helpers_die 'patch -p1  failed with /etc/portage/patches/dev-libs/rocm-comgr/llvm-fix.patch'
 *   isolated-functions.sh, line  112:  Called die
 * The specific snippet of code:
 *        die "$@"
 * 
 * If you need support, post the output of `emerge --info '=dev-libs/rocm-comgr-5.3.3::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-libs/rocm-comgr-5.3.3::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'
 * S: '/var/tmp/portage/dev-libs/rocm-comgr-5.3.3/work/ROCm-CompilerSupport-rocm-5.3.3/lib/comgr'

It looks like on Gentoo the archive does not have the same layout. The patch should be

diff -ru ROCm-CompilerSupport-rocm-5.3.3.orig/src/comgr-objdump.cpp ROCm-CompilerSupport-rocm-5.3.3/src/comgr-objdump.cpp
--- ROCm-CompilerSupport-rocm-5.3.3.orig/src/comgr-objdump.cpp  2022-07-28 17:06:14.000000000 +0200
+++ ROCm-CompilerSupport-rocm-5.3.3/src/comgr-objdump.cpp   2022-12-20 20:03:53.357398486 +0100
@@ -175,9 +175,6 @@
 static cl::alias SectionHeadersShort("headers",
                                      cl::desc("Alias for --section-headers"),
                                      cl::aliasopt(SectionHeaders));
-static cl::alias SectionHeadersShorter("h",
-                                       cl::desc("Alias for --section-headers"),
-                                       cl::aliasopt(SectionHeaders));

 cl::list<std::string>
     FilterSections("section",

The difference is only in the file path (lib/comgr is not present for Gentoo).

RadoslavL commented 1 year ago

@xytovl Thanks for the help!

labre commented 1 year ago

but it requires big upstream LLVM changes.

Which changes might that be and are they addressed by the following reviews? https://reviews.llvm.org/D129129 https://reviews.llvm.org/D129134

I happen to have approached this issue from the LLVM side with https://reviews.llvm.org/D141019 eventually leading me back to RocM after I threw gdb at it. My “solution” was just a quick hack however to test, whether that error can be ignored in this case.

lamb-j commented 1 year ago

@xytovl I just committed https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/commit/2d05f9e480cbc591a6b888dfd49d9f7ef1bef25f

Thanks for tracking this down!

@labre Nicolai's patches and the removal of ManagedStatic definitely addresses some of the issues. I haven't sat down with them yet to see what still needs to happen. Besides examples like this where we have Comgr and Mesa running at the same time, we also would like the option of having multiple instances of Comgr (and thus multiple LLVMs) running without hitting issues with the option parser.

I'll let you know if I figure anything out on this front, and I'd love to be kept in the loop on your progress!

labre commented 1 year ago

I'll let you know if I figure anything out on this front, and I'd love to be kept in the loop on your progress!

That would be taking false credit for others work. I’m just a Gentoo user, who simultaneously tried to track down this in LLVM instead of RocM. I just tested a quick hack in LLVM and then opened a RFC on this to increase awareness for the CommandLine parser issues in practice.

However I eventually was hinted towards this bug by gdb and the ongoing progress in LLVM by another discourse user, so basically I was a tad too late for investigating this one. ;)

Since @nhaehnle is the author of the Revisions, he is the better contact for staying in the loop. Thanks for merging the fix. :)

LordOfDragons commented 1 year ago

I'll let you know if I figure anything out on this front, and I'd love to be kept in the loop on your progress!

That would be taking false credit for others work. I’m just a Gentoo user, who simultaneously tried to track down this in LLVM instead of RocM. I just tested a quick hack in LLVM and then opened a RFC on this to increase awareness for the CommandLine parser issues in practice.

However I eventually was hinted towards this bug by gdb and the ongoing progress in LLVM by another discourse user, so basically I was a tad too late for investigating this one. ;)

Since @nhaehnle is the author of the Revisions, he is the better contact for staying in the loop. Thanks for merging the fix. :)

You said you come from Gentoo. Have you been able to test this somehow? I'm hitting a similar problem with Blender crashing due to ROCm getting in conflict with MESA and the ROCm version 5.3.3 seems not to be new enough.

labre commented 1 year ago

You said you come from Gentoo. Have you been able to test this somehow? I'm hitting a similar problem with Blender crashing due to ROCm getting in conflict with MESA and the ROCm version 5.3.3 seems not to be new enough.

Well, I used the line deletions from the commit as a user patch and forgot about it. I updated it to be the output of git format-patch and adapted the path like outlined above:

The difference is only in the file path (lib/comgr is not present for Gentoo).

You can use the user patch attached with this post and put it into /etc/portage/patches/dev-libs/rocm-comgr as follows and as root, but please compare it with the commit diff. Do not ever trust a random patch offered to you on the Internet and double check the link targets of this post:

# Github did not like the patch extension, albeit claiming to do so
mv 0001-Remove-h-option-from-comgr-objdump.{md,patch}
mkdir -p /etc/portage/patches/dev-libs/rocm-comgr
cp 0001-Remove-h-option-from-comgr-objdump.patch /etc/portage/patches/dev-libs/rocm-comgr
# Remove rocm-opencl-runtime and rebuild runtime dependencies
emerge -C rocm-opencl-runtime
emerge --depclean
emerge rocm-opencl-runtime

0001-Remove-h-option-from-comgr-objdump.md

Let me know, whether this works for you. If you have any questions feel free to ask. Otherwise you’re welcome. :) Oh, and since you seem to not know the user patches feature, I suppose you’re relatively new to the Gentoo world. In that case, welcome to the community! :)

LordOfDragons commented 1 year ago

Actually I use Gentoo since nearly 20 years but I never got in contact with user patches so far.

wolfiestyle commented 1 year ago

This is the modified version of the patch i use myself on Gentoo with the path changes done. Drop in /etc/portage/patches/dev-libs/rocm-comgr-5.3.3-r1/ (you will need to mkdir -p this path) and create a .patch file with this content

diff -ru comgr.orig/src/comgr-objdump.cpp comgr/src/comgr-objdump.cpp
--- comgr.orig/src/comgr-objdump.cpp    2022-07-28 17:06:14.000000000 +0200
+++ comgr/src/comgr-objdump.cpp 2022-12-20 20:03:53.357398486 +0100
@@ -175,9 +175,6 @@
 static cl::alias SectionHeadersShort("headers",
                                      cl::desc("Alias for --section-headers"),
                                      cl::aliasopt(SectionHeaders));
-static cl::alias SectionHeadersShorter("h",
-                                       cl::desc("Alias for --section-headers"),
-                                       cl::aliasopt(SectionHeaders));

 cl::list<std::string>
     FilterSections("section",