Closed devurandom closed 6 months ago
When trying this, I also regularly encountered OpenCL "out of memory" errors.
I encountered a similar error on ROCm 4.5.2. The first time I encountered a system freeze which appeared to be a result of running out of RAM (32 GBs)! After that whenever I try to run I just get hipErrorOutOfMemory
Maybe I need to try downgrading?
hipconfig:
HIP version : 4.4.21432-f9dccde4
== hipconfig
HIP_PATH : /opt/rocm-4.5.2/hip
ROCM_PATH : /opt/rocm-4.5.2
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME : rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-4.5.2/hip/include -I/opt/rocm-4.5.2/llvm/bin/../lib/clang/13.0.0 -I/opt/rocm-4.5.2/hsa/include
== hip-clang
HSA_PATH : /opt/rocm-4.5.2/hsa
HIP_CLANG_PATH : /opt/rocm-4.5.2/llvm/bin
AMD clang version 13.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.5.2 21432 9bbd96fd1936641cd47defd8022edafd063019d5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-4.5.2/llvm/bin
AMD LLVM version 13.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver2
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags : -std=c++11 -isystem "/opt/rocm-4.5.2/llvm/lib/clang/13.0.0/include/.." -isystem /opt/rocm-4.5.2/hsa/include -isystem "/opt/rocm-4.5.2/hip/include" -O3
hip-clang-ldflags : --driver-mode=g++ -L"/opt/rocm-4.5.2/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt
=== Environment Variables
PATH=/home/user1/.vscode-server/bin/fe719cd3e5825bf14e14182fddeb88ee8daf044f/bin:/home/user1/.vscode-server/bin/fe719cd3e5825bf14e14182fddeb88ee8daf044f/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
== Linux Kernel
Hostname : roxane
Linux roxane 5.10.0-1052-oem #54-Ubuntu SMP Tue Nov 23 09:06:13 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
@devurandom, Sorry for the lack of response. Please try latest ROCm 6.0.2 (HIP 6.0.32831) to see if your issue still exists? If resolved, please close the ticket. Thanks.
Sorry, this has been too long and I no longer have access to that system.
System information
Versions:
Problem
Afterwards my graphical system freezes and I need to REISUB.
This is reproducible every time I run
./square.out
.Regression
I never got HIP to work on this system. Still working on it. :)
Logs
Excerpts from the system journal of my last boot:
Afterwards the system was running for a while without me interacting with it. When I came back, I couldn't access my X11 session anymore (system not reacting to keyboard input, like NumLock, switching to VT not possible), so I had to REISUB:
Other information
I also see exceptions and segfaults in Clover and ROCm's OpenCL implementation when executing
clinfo
androcminfo
:I also see the system hanging in a very similar manner to this one when trying to use OpenCL from the JVM (running the Neanderthal examples), but since that is a lot more high level, I do not have a useful MWE for that. When trying this, I also regularly encountered OpenCL "out of memory" errors.