ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.76k stars 539 forks source link

'hipErrorInvalidDevice'(101) in square example #2700

Closed waeschd closed 2 years ago

waeschd commented 2 years ago

Hi, I've tried to install HIP on my Ubuntu 20.04 on an AMD Platform (Vega 56)

  1. I followed the instructions here https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html. I decided to use the Installer Script Method. So i downloaded the .deb and installed it accordingly to the instructions. Then executed sudo amdgpu-install --usecase=hiplibsdk,rocm

  2. Afterwards I've tested the square example.

    ~/Downloads/HIP/samples/0_Intro/square$   make
    /opt/rocm//hip/bin/hipify-perl square.cu > square.cpp
    /opt/rocm//hip/bin/hipcc  square.cpp -o square.out
    ~/Downloads/HIP/samples/0_Intro/square$  ./square.out 
    error: 'hipErrorInvalidDevice'(101) at square.cpp:61

I also run hipconfig --full

~/Downloads/HIP/samples/0_Intro/square$  hipconfig --full
HIP version  : 5.1.20532-f592a741

== hipconfig
HIP_PATH     : /opt/rocm-5.1.3/hip
ROCM_PATH    : /opt/rocm-5.1.3
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.1.3/hip/include -I/opt/rocm-5.1.3/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.1.3/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.1.3/hsa
HIP_CLANG_PATH   : /opt/rocm-5.1.3/llvm/bin
AMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.1.3 22114 5cba46feb6af367b1cafaa183ec42dbfb8207b14)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.1.3/llvm/bin
AMD LLVM version 14.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-5.1.3/llvm/lib/clang/14.0.0/include/.." -isystem /opt/rocm-5.1.3/hsa/include -isystem "/opt/rocm-5.1.3/hip/include" -O3
hip-clang-ldflags  :  -L"/opt/rocm-5.1.3/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname     : eternity
Linux eternity 5.13.0-44-generic #49~20.04.1-Ubuntu SMP Wed May 18 18:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:    20.04
Codename:   focal

Someone else had a similar issue before #2265 , but there the issue was solved by adding "export HIP_PLATFORM=nvidia". But here I'm on an AMD Platform.

Best Regards

cloudhan commented 2 years ago

See if you can run the binary with sudo, if yes, then you are suffering from permission issue.

sudo usermod -a -G video render $USER

should fix your problem.

If not,

strace <your_executable> <your_args...> 2>&1 | grep -i permission

to see what FD is causing the problem here.

waeschd commented 2 years ago

Thank you very much, missing permissions were indeed the problem :D Adding myself to video and render group and restarting the device fixed the problem.

But just one side note, your command above threw an error. You probably meant: sudo usermod -a -G video,render $USER

Running this command and rebooting afterwards managed to do the wanted thing.

Just out of curiosity: I remember that I saw the information to add yourself to the video,render groups in an older HIP Installer Guide. But if I remember correctly I didn't saw that information in the newest installer guide.

Also I installed HIP on a NVIDIA Platform and there I also didn't had to add myself to the video,render groups to make it work.

Nevertheless I'm grateful for the help