Open upsj opened 9 months ago
We discussed this issue and the associated PR (#1622) in the community meeting today. @DrDaveD asked me to create a new issue, but I thought it better to continue on this existing issue instead.
As of now, we believe that the original list of libraries in rocmliblist.conf
is flawed. But we are still trying to arrive at a more comprehensive list that covers the majority of use cases. @upsj has determined a different list of libs that works for their use case and has proposed #1622 to update. I think this is an improvement over the current state, but I'm concerned it is still not a comprehensive and general list that will work across most use cases. I've proposed the following list instead based on version numbers in the library names and their correspondence with the compiled kernel module.
libamd_comgr.so
libamdhip64.so
libhiprtc-builtins.so
libhiprtc.so
libhsa-runtime64.so
librocm-core.so
We are hoping that someone with an AMD GPU can test this list with some workloads. Failing that, I will propose a new PR that simply comments out the existing list and adds this one in its place. Hopefully, this will not be too disruptive since users can comment/uncomment libraries if they find that their GPU-enabled workflows are failing.
I didn't remember there was an existing issue because it wasn't marked with the 1.3.0 milestone. That's fixed now.
Version of Apptainer
What version of Apptainer (or Singularity) are you using?
main
branchExpected behavior
Containers should provide libraries like rocBLAS, rocFFT etc, the host libraries should not be forwarded for this case.
Actual behavior
The host libraries get loaded into .singularity.d/libs
When running a binary that tries to use rocBLAS, this leads to the following error:
To my knowledge, both rocBLAS and rocFFT rely on additional files, probably for JIT compilation. It should be safest to let the container provide these files to avoid incompatibilities.
Steps to reproduce this behavior
Run
apptainer shell --rocm
on a container containing and usinglibrocblas.so
. The host-provided library will be used instead. Trying to run a binary that uses rocBLAS will fail with the above errorWhat OS/distro are you running
Ubuntu 22.04.2 LTS
How did you install Apptainer
from source