Closed traylenator closed 1 month ago
Hi @traylenator. Internal ticket has been created to investigate your issue. Thanks!
@traylenator rocm-hip-sdk
is meant for AMD platforms (see our documentation), which is why installing it is likely going to result in conflicts with the the NVIDIA hip packages. A potential work-around I can think of is to install these in separate docker containers. Out of curiosity, would you mind letting us know your use case which requires both to be installed?
Hope this helps.
Thanks!
@tcgu-amd thanks for the comments.
We are running mostly NVIDIA cards today but wanted to try out HIP on those machines. We are hoping to avoid vendor lock-in and (cross-)compile to both AMD and NVIDIA hardware platform from the same build machine at the same time so any hardware migration in the future would be hopefully easier.
@traylenator Ah I see. That makes sense. Unfortunately I don't think it is possible to achieve this on bare metal at the moment. But, as I mentioned, I think it would be possible to achieve this through installing ROCm in two separate docker containers, given that the drivers and hardware are configured properly on the host system. There will be some redundancy for sure, but unfortunately that is unavoidable at the moment because a lot of our libraries in our runtime is either compiled for NVIDIA or AMD.
Thanks!
What's the difference between hipcc and hipcc-nvidea?
Certainly for early tests items compiled with hipcc run on NVIDIA?
Is there some point this stops working?
@traylenator I believe the key difference is that hipcc depends on rocm-llvm, whereas hipcc-nvidia doesn't. The source code of the two versions hipcc themselves are virtually the same; however, as hipcc are just perl wrappers, there might still be discrepancies due to different backends.
@traylenator, to follow up, it might be possible to resolve the hipcc conflict by uninstalling hipcc-nvidia and then try installing rocm-hip-sdk, since the hipcc installed by rocm-hip-sdk should work for both NVIDIA and AMD runtimes. That being said, it is still strongly recommend to use a containerized environment to avoid further conflicts. Thanks!
Yes for sure you can install hipcc rather than hipcc-nvidia and all is "good" as I say the results even run on NV.
This can probably be closed.
Thanks for all the responses, much appreciated.
@traylenator That's cool to hear! Thanks again for reaching out!
Problem Description
Starting on RHEL9 and repository yum repo: https://repo.radeon.com/rocm/el9/6.2.2/main
As per the instruction for installing on an nvidia node both
hip-devel
andhip-runtime-nvidia
can be installed. This pulls inhipcc-nvidia
which is probably correct.If you then try to install
rocm-hip-sdk
this fails due toIn particular
rocm-hip-sdk
requiresrocm-hip-runtime-devel
which in turn requireshipcc
resulting in the conflict.Operating System
Red Hat Enterprise Linux 9.4
CPU
Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
GPU
Tesla T4
ROCm Version
ROCm 6.2.2
ROCm Component
HIPCC
Steps to Reproduce
Configure https://repo.radeon.com/rocm/el9/6.2.2/main
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response