ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.71k stars 528 forks source link

HIP installation on Nvidia platform #3521

Open ABHINAVONGOLU opened 3 months ago

ABHINAVONGOLU commented 3 months ago

apt-get install hip-runtime-nvidia hip-dev When I run this command on my terminal I was getting following error message. Can someone help me fixing this issue ??

Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package hip-runtime-nvidia E: Unable to locate package hip-dev

logic-finder commented 3 months ago

Hi it seems that those error messages are exactly the same what I saw a few days ago. I think it might be helpful to refer to the this issue #3519. Please see the "additional information" section and the comment by harkgill-amd right below. Thank you.

ABHINAVONGOLU commented 3 months ago

$ sudo amdgpu-install --usecase=hip,hiplibsdk Get:1 file:/var/cuda-repo-ubuntu2204-12-5-local InRelease [1,572 B] Get:1 file:/var/cuda-repo-ubuntu2204-12-5-local InRelease [1,572 B] Hit:2 https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 InRelease
Hit:3 https://dl.google.com/linux/chrome/deb stable InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 https://packages.microsoft.com/repos/code stable InRelease
Hit:6 http://in.archive.ubuntu.com/ubuntu jammy InRelease
Hit:7 http://in.archive.ubuntu.com/ubuntu jammy-updates InRelease Hit:8 https://ppa.launchpadcontent.net/touchegg/stable/ubuntu jammy InRelease Hit:9 http://in.archive.ubuntu.com/ubuntu jammy-backports InRelease Reading package lists... Done Reading package lists... Done Building dependency tree... Done Reading state information... Done Package amdgpu-dkms is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source

E: Unable to locate package rocm-hip-runtime E: Unable to locate package rocm-hip-sdk E: Package 'amdgpu-dkms' has no installation candidate

still Iam getting this error??

harkgill-amd commented 3 months ago

Hi @ABHINAVONGOLU, as @logic-finder mentioned, we do currently have an investigation ongoing to fix the installation issues for HIP on NVIDIA platforms. Thanks!

ABHINAVONGOLU commented 3 months ago

Just inform here when fixing is completed please @harkgill-amd

ABHINAVONGOLU commented 3 months ago

@logic-finder , @harkgill-amd can someone one confirm whether it is working fine or not?? Please

harkgill-amd commented 3 months ago

Hi @ABHINAVONGOLU, this issue is still being investigated. I will provide updates as soon as I receive them.

avickars commented 1 month ago

This is still broken with Hip 6.2

harkgill-amd commented 1 month ago

@avickars, are you attempting to install on Ubuntu 24.04 or on 22.04? What error are you running into?

avickars commented 1 month ago

@avickars, are you attempting to install on Ubuntu 24.04 or on 22.04? What error are you running into?

I am trying to install it on 24.04 and I am getting the same error as originally reported in this GitHub issue

harkgill-amd commented 1 month ago

Could you try adding the radeon repo through

wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/noble/amdgpu-install_6.2.60200-1_all.deb
sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
sudo apt update

and then running apt-get install hip-runtime-nvidia hip-dev

The repo should have the missing packages that are causing the error. However, the documentation needs to be updated to reflect this, and I will begin the process of getting it updated.

avickars commented 1 month ago

apt-get install hip-runtime-nvidia

Thanks but it then doesn't work correctly. Below is the output of my hipconfig:

HIP version: 6.2.41133-dd7f95766

==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver4

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/opt/rocm/bin:/usr/local/cuda/bin:/home/aidan/miniconda3/bin:/home/aidan/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
LD_LIBRARY_PATH=/opt/rocm/lib:

== Linux Kernel
Hostname      :
the-dark-knight
Linux the-dark-knight 6.8.0-40-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 10:34:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble

It thinks I have an AMD gpu. Conversely when I compiled HIP 6.1 from source, I would get:

HIP version  : 6.1.40093-bd86f1708

== hipconfig
HIP_PATH     : /opt/rocm-nvidia-6.1
ROCM_PATH    : /opt/rocm
HIP_COMPILER : nvcc
HIP_PLATFORM : amd
HIP_RUNTIME  : cuda
Use of uninitialized value $CPP_CONFIG in print at .//hipconfig.pl line 154.
CPP_CONFIG   : 

Unexpected HIP_COMPILER: nvcc

=== Environment Variables
PATH=/opt/rocm/bin:/usr/local/cuda/bin:/home/aidan/miniconda3/bin:/home/aidan/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
LD_LIBRARY_PATH=/opt/rocm/lib:

== Linux Kernel
Hostname     : the-dark-knight
Linux the-dark-knight 6.8.0-40-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 10:34:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:    24.04
Codename:   noble
harkgill-amd commented 1 month ago

Adding the environment variable HIP_PLATFORM='nvidia' should resolve this discrepancy. Please give it a try and let me know.

eljrte commented 1 month ago

Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: hip-runtime-nvidia : Depends: cuda (>= 7.5) but it is not installable rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed Depends: libstdc++6 (>= 13.1) but 12.3.0-1ubuntu1~22.04 is to be installed E: Unable to correct problems, you have held broken packages.

I have met the error above. My environment is ubuntu22.04+RTX4080+nvcc12.1+gcc/gxx11.4 Relllllllly appreciate your tips

eljrte commented 1 month ago

Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: hip-runtime-nvidia : Depends: cuda (>= 7.5) but it is not installable rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed Depends: libstdc++6 (>= 13.1) but 12.3.0-1ubuntu1~22.04 is to be installed E: Unable to correct problems, you have held broken packages.

I have met the error above. My environment is ubuntu22.04+RTX4080+nvcc12.1+gcc/gxx11.4 Relllllllly appreciate your tips

By the way , I have followed the replies above in this issue

avickars commented 3 weeks ago

Adding the environment variable HIP_PLATFORM='nvidia' should resolve this discrepancy. Please give it a try and let me know.

@harkgill-amd I tried it today and it didn't work. It had the same results as before. To be succinct I executed the following:

export HIP_PLATFORM='nvidia'
wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/noble/amdgpu-install_6.2.60200-1_all.deb
sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
sudo apt update
sudo apt-get install hip-runtime-nvidia hip-dev

Again my hipconfig lookes like:

HIP version: 6.2.41133-dd7f95766

==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver4

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/opt/rocm/bin:/usr/local/cuda/bin:/home/aidan/Projects/pipeline/env/bin:/home/aidan/miniconda3/condabin:/home/aidan/.vscode-server/cli/servers/Stable-fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/server/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH=/opt/rocm/lib/:

== Linux Kernel
Hostname      :
the-dark-knight
Linux the-dark-knight 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble

Are you able to reproduce this?

harkgill-amd commented 3 weeks ago

@avickars, looks like the environment variable is being overridden by the hip installation. Can you try setting it again after sudo apt-get install hip-runtime-nvidia hip-dev? For reference, here is what I see on my end

  1. Output of hipconfig post-install. (We have identified why the compiler is being set to amd and will be addressing this issue)
==hipconfig
HIP_PATH           :/opt/rocm-6.2.0
ROCM_PATH          :/opt/rocm-6.2.0
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-6.2.0/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm-6.2.0/lib/llvm/bin
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.2.0/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg
AMD LLVM version 18.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: alderlake

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname      :
rocm
Linux rocm 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble
  1. Set Environment Variable export HIP_PLATFORM='nvidia'
  2. hipconfig is updated to use the nvcc compiler and the HIP_PLATFORM variable is listed under ==Environment Variables
    
    ==hipconfig
    HIP_PATH           :/opt/rocm-6.2.0
    ROCM_PATH          :/opt/rocm-6.2.0
    HIP_COMPILER       :nvcc
    HIP_PLATFORM       :nvidia
    HIP_RUNTIME        :cuda
    CPP_CONFIG         : -D__HIP_PLATFORM_NVCC__= -D__HIP_PLATFORM_NVIDIA__= -I/opt/rocm-6.2.0/include -I/usr/local/cuda/include

== nvcc CUDA_PATH :/usr/local/cuda nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Aug_14_10:10:22_PDT_2024 Cuda compilation tools, release 12.6, V12.6.68 Build cuda_12.6.r12.6/compiler.34714021_0

== Environment Variables PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HIP_PLATFORM=nvidia

== Linux Kernel Hostname : rocm Linux rocm 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 24.04 LTS Release: 24.04 Codename: noble

harkgill-amd commented 3 weeks ago

@eljrte, the error you are seeing is likely due to the installation of the 24.04 amdgpu-install .deb rather than the 22.04 version. There is also an error regarding a cuda dependency so let's try a clean install with the following

  1. Remove previous amdgpu-install

    sudo amdgpu-install --uninstall
    sudo apt purge amdgpu-install
    sudo apt autoremove
  2. Install 22.04 (jammy) amdgpu-install

    wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/jammy/amdgpu-install_6.2.60200-1_all.deb
    sudo apt install ./amdgpu-install_6.2.60200-1_all.deb
    sudo apt update
  3. Install Cuda Toolkit for Ubuntu 22.04 following the commands here.

  4. Install hip-runtime-nvidia and hip-dev packages with sudo apt-get install hip-runtime-nvidia hip-dev.

  5. Set HIP_PLATFORM environment variable with export HIP_PLATFORM='nvidia'.

  6. Run /opt/rocm/bin/hipconfig --full and confirm hip has been installed and is set to nvcc compiler.

eddy16112 commented 3 days ago

Is it possible to install hip-runtime-nvidia without installing cuda and cuda driver?