ROCm / ROCR-Runtime

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
https://rocm.docs.amd.com/projects/ROCR-Runtime/en/latest/
Other
205 stars 97 forks source link

[Issue]: ROCR-Runtime fails to build with musl>=1.2.3 #181

Open AngryLoki opened 3 months ago

AngryLoki commented 3 months ago

Problem Description

Musl libc (and specifically releases after 1.2.3) has pedantic interpretation of C/C++ standard comparisons with NULL, resulting in series of errors:

pthread_attr_setaffinity_np is not supported in musl, but according to comment in #143, it was already addressed in 6.1.0.

pthread_rwlockattr_setkind_np is not supported in musl, and it was not mentioned in #143.

Could you check this patch and apply whatever is needed for your internal branch? Thank you! https://github.com/AngryLoki/gentoo/blob/5eaf4513b4af1c0ae9ba8f30874d7168010c057d/dev-libs/rocr-runtime/files/rocr-runtime-5.7.1-musl.patch

Operating System

Gentoo

ROCm Version

ROCm 6.0.0, ROCm 5.7.1

ROCm Component

ROCR-Runtime

shwetagkhatri commented 3 months ago

This issue has been addressed by https://github.com/ROCm/rocminfo/pull/55 and will be available in ROCM 6.1 release.

AngryLoki commented 3 months ago

Hi @shwetagkhatri, looks like https://github.com/ROCm/rocminfo/pull/55 is a wrong link, could you recheck it?

shwetagkhatri commented 3 months ago

@AngryLoki - Sorry, my bad! The patches you have suggested make sense for Musl libc. I am not sure about this change - @@ -122,12 +125,6 @@ class os_thread { and @@ -157,6 +154,14 @@ class os_thread { what is the purpose of this change? What error do you see w.r.t this?

AngryLoki commented 3 months ago

@shwetagkhatri , if you applied https://github.com/ROCm/ROCR-Runtime/pull/143 previously (which I can't see, because source code of 6.1 branch is not published yet), ignore part about pthread.

The error is: pthread_attr_setaffinity_np: symbol not found. This function does not exist in musl, instead it is only possible to set pthread affinity after thread was created with pthread_setaffinity_np. I guess #ifdef __GLIBC__ ... #ifndef __GLIBC__ approach from https://github.com/ROCm/ROCR-Runtime/pull/143/files is better in theory, but I just patched it independently of https://github.com/ROCm/ROCR-Runtime/pull/143

shwetagkhatri commented 2 months ago

Sorry for the delay. @AngryLoki . This patch and https://github.com/ROCm/ROCR-Runtime/pull/143 have been integrated and will be part for ROCm 6.2 release. Thank you.

AngryLoki commented 2 months ago

Hi @shwetagkhatri ,

I've found another issue with musl.

The fix repeats common pattern, used for musl, e. g. https://github.com/void-linux/void-packages/blob/5ccf1c66a1df2d644e1a0db0a68fca321469c57e/srcpkgs/MangoHud/patches/0001-elfhacks-d_un.d_ptr-is-relative-on-non-glibc-systems.patch#L90 . Quoting:

d_un.d_ptr is relative on non glibc systems
elf(5) documents it this way, glibc diverts from this documentation

ROCR-Runtime crashes on load in https://github.com/ROCm/ROCR-Runtime/blob/rocm-6.1.0/src/core/util/lnx/os_linux.cpp#L299 . It can be fixed like this:

--- src.orig/core/util/lnx/os_linux.cpp
+++ src/core/util/lnx/os_linux.cpp
@@ -65,6 +65,16 @@
 #include <cpuid.h>
 #endif

+/*
+ * d_un.d_ptr is relative on non glibc systems
+ * elf(5) documents it this way, glibc diverts from this documentation
+ */
+#ifdef __GLIBC__
+#define ABS_ADDR(base, ptr) (ptr)
+#else
+#define ABS_ADDR(base, ptr) ((base) + (ptr))
+#endif
+
 namespace rocr {
 namespace os {

@@ -299,7 +309,7 @@ static int callback(struct dl_phdr_info* info, size_t size, void* data) {
         for (int j = 0;; j++) {
           if (dyn_section[j].d_tag == DT_NULL) break;

-          if (dyn_section[j].d_tag == DT_STRTAB) strings = (char*)(dyn_section[j].d_un.d_ptr);
+          if (dyn_section[j].d_tag == DT_STRTAB) strings = (char*)ABS_ADDR(info->dlpi_addr, dyn_section[j].d_un.d_ptr);

           if (dyn_section[j].d_tag == DT_STRSZ) limit = dyn_section[j].d_un.d_val;
         }

This patch is applicable to v6.1.0. I don't know if it was fixed in 6.2.0, please check.