Open icaspell opened 1 year ago
Can you show output of hipconfig
and rocminfo
?
hipconfig
^[[3~HIP version : 4.4.21432-f9dccde4
== hipconfig
HIP_PATH : /opt/rocm-4.5.2/hip
ROCM_PATH : /opt/rocm-4.5.2
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME : rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-4.5.2/hip/include -I/opt/rocm-4.5.2/llvm/bin/../lib/clang/13.0.0 -I/opt/rocm-4.5.2/hsa/include
== hip-clang
HSA_PATH : /opt/rocm-4.5.2/hsa
HIP_CLANG_PATH : /opt/rocm-4.5.2/llvm/bin
AMD clang version 13.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.5.2 21432 9bbd96fd1936641cd47defd8022edafd063019d5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-4.5.2/llvm/bin
AMD LLVM version 13.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver1
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags : -std=c++11 -isystem "/opt/rocm-4.5.2/llvm/lib/clang/13.0.0/include/.." -isystem /opt/rocm-4.5.2/hsa/include -isystem "/opt/rocm-4.5.2/hip/include" -O3
hip-clang-ldflags : --driver-mode=g++ -L"/opt/rocm-4.5.2/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt
=== Environment Variables
PATH=/home/icaspell/miniconda3/bin:/home/icaspell/miniconda3/condabin:/home/icaspell/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
== Linux Kernel
Hostname : icaspell-B450M-S2H-V2
Linux icaspell-B450M-S2H-V2 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4308
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16246008(0xf7e4f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16246008(0xf7e4f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16246008(0xf7e4f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1032
Uuid: GPU-XX
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 2048(0x800) KB
L3: 32768(0x8000) KB
Chip ID: 29679(0x73ef)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2765
BDFID: 768
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1032
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Thanks!
Ok, I see the problem.
The conda package is built with ROCm 5.3.0, it should work with other 5.x versions (as shared libs are linked to /opt/rocm/lib/libamdhip64.so.5
).
You have ROCm 4.5.2 which is 11 months old. Is it possible to update your system to more recent version?
I updated to 5.2 and I still get the same error.
hipconfig
HIP version : 5.2.21152-4b155a06
== hipconfig
HIP_PATH : /opt/rocm-5.2.1
ROCM_PATH : /opt/rocm-5.2.1
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME : rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.2.1/include -I/opt/rocm-5.2.1/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.2.1/hsa/include
== hip-clang
HSA_PATH : /opt/rocm-5.2.1/hsa
HIP_CLANG_PATH : /opt/rocm-5.2.1/llvm/bin
AMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.1 22204 50d6d5d5b608d2abd6af44314abc6ad20036af3b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.2.1/llvm/bin
AMD LLVM version 14.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver1
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags : -std=c++11 -isystem "/opt/rocm-5.2.1/llvm/lib/clang/14.0.0/include/.." -isystem /opt/rocm-5.2.1/hsa/include -isystem "/opt/rocm-5.2.1/include" -O3
hip-clang-ldflags : -L"/opt/rocm-5.2.1/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt
=== Environment Variables
PATH=/home/icaspell/miniconda3/envs/openmm-hip/bin:/home/icaspell/miniconda3/condabin:/home/icaspell/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
== Linux Kernel
Hostname : icaspell-B450M-S2H-V2
Linux icaspell-B450M-S2H-V2 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4308
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16246004(0xf7e4f4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16246004(0xf7e4f4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16246004(0xf7e4f4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1032
Uuid: GPU-XX
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 2048(0x800) KB
L3: 32768(0x8000) KB
Chip ID: 29679(0x73ef)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2765
BDFID: 768
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1032
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
python -c "import openmm as mm; print('---Loaded---', *mm.pluginLoadedLibNames, '---Failed---', *mm.Platform.getPluginLoadFailures(), sep='\n')"
---Loaded---
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
---Failed---
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
It looks better, actually. You only need to install hipfft: https://github.com/StreamHPC/openmm-hip#installing-with-conda
I already did should I recreate the environment ?
sudo apt install hipfft
[sudo] password for icaspell:
Reading package lists... Done
Building dependency tree
Reading state information... Done
hipfft is already the newest version (1.0.8.50201-79).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
No difference
python -c "import openmm as mm; print('---Loaded---', *mm.pluginLoadedLibNames, '---Failed---', *mm.Platform.getPluginLoadFailures(), sep='\n')"
---Loaded---
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
---Failed---
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
python -m openmm.testInstallation
OpenMM Version: 8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a
There are 3 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 OpenCL - Successfully computed forces
Median difference in forces between platforms:
Reference vs. CPU: 6.31765e-06
Reference vs. OpenCL: 6.74414e-06
CPU vs. OpenCL: 7.08274e-07
Strange. It looks like hipfft
does not depend on rocfft
anymore so it's not automatically installed.
Can you check this?
sudo apt install rocfft
If it helps I'll update the instructions in README.
it detects hip now but it outputs the following error
8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a
There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
python -c "import openmm as mm; print('---Loaded---', *mm.pluginLoadedLibNames, '---Failed---', *mm.Platform.getPluginLoadFailures(), sep='\n')"
---Loaded---
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
---Failed---
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
I managed to make it work by setting the following environment variable export HSA_OVERRIDE_GFX_VERSION=10.3.0
apparently rx 6650xt is not officially supported by rocm that's why it was outputting this error
8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a
There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 HIP - Successfully computed forces
4 OpenCL - Successfully computed forces
Median difference in forces between platforms:
Reference vs. CPU: 6.32123e-06
Reference vs. HIP: 6.75557e-06
CPU vs. HIP: 8.49803e-07
Reference vs. OpenCL: 6.74414e-06
CPU vs. OpenCL: 7.01652e-07
HIP vs. OpenCL: 5.06541e-07
All differences are within tolerance.
I will run some tests and put the results compared to my previous OpenCL results thanks
It seems that gfx1032 is not officially supported by ROCm. And rocFFT does not build kernels for this architecture: https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CMakeLists.txt#L150
Since hipFFT/rocFFT is not used by default as FFT backend, and everything else should work on gfx1032 without issues, we can make hipFFT support optional in CMake and build conda package without it. We need to think about this approach.
By the way, could you upload a log of AMD_LOG_LEVEL=4 python -m openmm.testInstallation
WITHOUT your workaround with HSA_OVERRIDE_GFX_VERSION
? I want to see when exactly it crashes.
Sure
AMD_LOG_LEVEL=4 python -m openmm.testInstallation
:3:rocdevice.cpp :416 : 4635510823 us: 41087: [tid:0x7f5165fac740] Initializing HSA stack.
:3:comgrctx.cpp :33 : 4635539990 us: 41087: [tid:0x7f5165fac740] Loading COMGR library.
:3:rocdevice.cpp :207 : 4635544107 us: 41087: [tid:0x7f5165fac740] Numa selects cpu agent[0]=0x55de380e1fb0(fine=0x55de380e1750,coarse=0x55de380e6f50) for gpu agent=0x55de380e7410
:3:rocdevice.cpp :1611: 4635544517 us: 41087: [tid:0x7f5165fac740] HMM support: 1, xnack: 0, direct host access: 0
:4:rocdevice.cpp :1918: 4635544804 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f504ac00000, size 0x101000
:4:rocdevice.cpp :1918: 4635545188 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f504aa00000, size 0x101000
:4:runtime.cpp :83 : 4635545515 us: 41087: [tid:0x7f5165fac740] init
OpenMM Version: 8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a
There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
:3:rocdevice.cpp :416 : 4636803898 us: 41087: [tid:0x7f5165fac740] Initializing HSA stack.
:3:comgrctx.cpp :33 : 4636803973 us: 41087: [tid:0x7f5165fac740] Loading COMGR library.
:3:rocdevice.cpp :207 : 4636804034 us: 41087: [tid:0x7f5165fac740] Numa selects cpu agent[0]=0x55de380e1fb0(fine=0x55de380e1750,coarse=0x55de380e6f50) for gpu agent=0x55de380e7410
:3:rocdevice.cpp :1611: 4636804339 us: 41087: [tid:0x7f5165fac740] HMM support: 1, xnack: 0, direct host access: 0
:4:rocdevice.cpp :1918: 4636804409 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f504ad04000, size 0x28
:4:rocdevice.cpp :1918: 4636806345 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f5036500000, size 0x101000
:4:rocdevice.cpp :1918: 4636806774 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f5036300000, size 0x101000
:4:rocdevice.cpp :2054: 4636806925 us: 41087: [tid:0x7f5165fac740] Allocate hsa device memory 0x7f5034400000, size 0x100000
:4:runtime.cpp :83 : 4636806936 us: 41087: [tid:0x7f5165fac740] init
:3:hip_context.cpp :50 : 4636806943 us: 41087: [tid:0x7f5165fac740] Direct Dispatch: 1
:1:hip_code_object.cpp :460 : 4636806981 us: 41087: [tid:0x7f5165fac740] hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :461 : 4636806988 us: 41087: [tid:0x7f5165fac740] Devices:
:1:hip_code_object.cpp :464 : 4636806994 us: 41087: [tid:0x7f5165fac740] amdgcn-amd-amdhsa--gfx1032 - [Not Found]
:1:hip_code_object.cpp :468 : 4636806999 us: 41087: [tid:0x7f5165fac740] Bundled Code Objects:
:1:hip_code_object.cpp :485 : 4636807006 us: 41087: [tid:0x7f5165fac740] host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp :483 : 4636807015 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx1030 - [code object v4 is amdgcn-amd-amdhsa--gfx1030]
:1:hip_code_object.cpp :483 : 4636807022 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx803 - [code object v4 is amdgcn-amd-amdhsa--gfx803]
:1:hip_code_object.cpp :483 : 4636807029 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx900:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx900:xnack-]
:1:hip_code_object.cpp :483 : 4636807036 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx906:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx906:xnack-]
:1:hip_code_object.cpp :483 : 4636807042 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx908:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx908:xnack-]
:1:hip_code_object.cpp :483 : 4636807052 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx90a:xnack+ - [code object v4 is amdgcn-amd-amdhsa--gfx90a:xnack+]
:1:hip_code_object.cpp :483 : 4636807059 us: 41087: [tid:0x7f5165fac740] hipv4-amdgcn-amd-amdhsa--gfx90a:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx90a:xnack-]
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
It manages to pass all the tests except the stochastic one.
./test_openmm_hip.sh
#1: TestHipAmoebaExtrapolatedPolarization
Done
#2: TestHipAmoebaGeneralizedKirkwoodForce
Done
#3: TestHipAmoebaMultipoleForce
Done
#4: TestHipAmoebaTorsionTorsionForce
Done
#5: TestHipAmoebaVdwForce
Done
#6: TestHipAndersenThermostat
Done
#7: TestHipBrownianIntegrator
Done
#8: TestHipCheckpoints
Done
#9: TestHipCMAPTorsionForce
Done
#10: TestHipCMMotionRemover
Done
#11: TestHipCompiler
Done
#12: TestHipCompoundIntegrator
Done
#13: TestHipCustomAngleForce
Done
#14: TestHipCustomBondForce
Done
#15: TestHipCustomCentroidBondForce
Done
#16: TestHipCustomCompoundBondForce
Done
#17: TestHipCustomCVForce
Done
#18: TestHipCustomExternalForce
Done
#19: TestHipCustomGBForce
Done
#20: TestHipCustomHbondForce
Done
#21: TestHipCustomIntegrator
exception: Assertion failure at TestCustomIntegrator.h:1162. Expected 300, found 303.017 (This test is stochastic and may occasionally fail)
Done
#22: TestHipCustomManyParticleForce
Done
#23: TestHipCustomNonbondedForce
Done
#24: TestHipCustomTorsionForce
Done
#25: TestHipDispersionPME
Done
#26: TestHipDrudeForce
Done
#27: TestHipDrudeLangevinIntegrator
Done
#28: TestHipDrudeNoseHoover
Done
#29: TestHipDrudeSCFIntegrator
Done
#30: TestHipEwald
Done
#31: TestHipFFTImplFFT3D
Done
#32: TestHipFFTImplHipFFT
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
#33: TestHipFFTImplVkFFT
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
Done
#34: TestHipGayBerneForce
Done
#35: TestHipGBSAOBCForce
Done
#36: TestHipHarmonicAngleForce
Done
#37: TestHipHarmonicBondForce
Done
#38: TestHipHippoNonbondedForce
Done
#39: TestHipLangevinIntegrator
Done
#40: TestHipLangevinMiddleIntegrator
Done
#41: TestHipLocalEnergyMinimizer
Done
#42: TestHipMonteCarloAnisotropicBarostat
Done
#43: TestHipMonteCarloBarostat
Done
#44: TestHipMonteCarloFlexibleBarostat
Done
#45: TestHipMultipleForces
Done
#46: TestHipNonbondedForce
Done
#47: TestHipNoseHooverIntegrator
Done
#48: TestHipPeriodicTorsionForce
Done
#49: TestHipRandom
Done
#50: TestHipRBTorsionForce
Done
#51: TestHipRMSDForce
Done
#52: TestHipRpmd
Done
#53: TestHipSettle
Done
#54: TestHipSort
Done
#55: TestHipVariableLangevinIntegrator
Done
#56: TestHipVariableVerletIntegrator
Done
#57: TestHipVerletIntegrator
Done
#58: TestHipVirtualSites
Done
#59: TestHipWcaDispersionForce
Done
------------
Failed tests
------------
#32 TestHipFFTImplHipFFT
Here are my previous OpenCL benchmark
Platform: OpenCL
Precision: single
Test: gbsa
Ensemble: NVT
Step Size: 4 fs
Integrated 68989 steps in 52.7213 seconds
452.238 ns/day
Test: rf
Ensemble: NVT
Step Size: 4 fs
Integrated 22505 steps in 56.7139 seconds
137.14 ns/day
Test: pme (cutoff=0.9)
Ensemble: NVT
Step Size: 4 fs
Integrated 22461 steps in 57.937 seconds
133.982 ns/day
Test: apoa1rf
Ensemble: NVT
Step Size: 4 fs
Integrated 8627 steps in 61.5705 seconds
48.424 ns/day
Test: apoa1pme
Ensemble: NVT
Step Size: 4 fs
Integrated 8061 steps in 61.3665 seconds
45.3975 ns/day
Test: apoa1ljpme
Ensemble: NVT
Step Size: 4 fs
Integrated 8470 steps in 62.3557 seconds
46.9441 ns/day
Test: amoebagk (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 373 steps in 54.3781 seconds
1.1853 ns/day
Test: amoebapme (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 1355 steps in 53.4768 seconds
4.37842 ns/day
HIP benchmarks
python benchmark.py --platform HIP
Platform: HIP
Test: gbsa
Ensemble: NVT
Step Size: 4 fs
Integrated 202477 steps in 59.2487 seconds
1181.06 ns/day
Test: rf
Ensemble: NVT
Step Size: 4 fs
Integrated 186565 steps in 59.9133 seconds
1076.17 ns/day
Test: pme (cutoff=0.9)
Ensemble: NVT
Step Size: 4 fs
Integrated 137783 steps in 60.3333 seconds
789.246 ns/day
Test: apoa1rf
Ensemble: NVT
Step Size: 4 fs
Integrated 53920 steps in 61.4199 seconds
303.399 ns/day
Test: apoa1pme
Ensemble: NVT
Step Size: 4 fs
Integrated 36012 steps in 60.8835 seconds
204.419 ns/day
Test: apoa1ljpme
Ensemble: NVT
Step Size: 4 fs
Integrated 29134 steps in 60.7964 seconds
165.614 ns/day
Test: amoebagk (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 7800 steps in 58.3635 seconds
23.0939 ns/day
Test: amoebapme (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 3067 steps in 58.4128 seconds
9.07297 ns/day
The jump in performance is huge, are these numbers for real ? or is there is something wrong with it. Anyway thank you so much for this project this is literally the only software that can run molecular dynamics on AMD navi2 GPUs right now.
Thanks!
The jump in performance is huge, are these numbers for real ? or is there is something wrong with it.
The results look consistent with the numbers from #1 and what we saw on our devices. But please report any problems with precision, stability and performance, the tests can't check all possible cases so "real-world" experience is highly appreciated.
Could you also run amber20-dhfr
, amber20-cellulose
and amber20-stmv
? (use --test amber20-dhfr
)
python benchmark.py --platform OpenCL --test amber20-cellulose
Platform: OpenCL
Precision: single
Test: amber20-cellulose
Ensemble: NVT
Step Size: 4 fs
Integrated 1621 steps in 60.6103 seconds
9.24294 ns/day
python benchmark.py --platform=HIP --test amber20-cellulose --ensemble=NVT --precision=single
Platform: HIP
Test: amber20-cellulose
Ensemble: NVT
Step Size: 4 fs
Integrated 7930 steps in 60.8478 seconds
45.0404 ns/day
python benchmark.py --platform=OpenCL --test amber20-stmv --ensemble=NVT --precision=single
Platform: OpenCL
Precision: single
Test: amber20-stmv
Ensemble: NVT
Step Size: 4 fs
Integrated 464 steps in 58.7317 seconds
2.73035 ns/day
python benchmark.py --platform=HIP --test amber20-stmv --ensemble=NVT --precision=single
Platform: HIP
Test: amber20-stmv
Ensemble: NVT
Step Size: 4 fs
Integrated 2385 steps in 59.2904 seconds
13.902 ns/day
I had to install scipy to get dhfr running.
python benchmark.py --platform=OpenCL --test amber20-dhfr --ensemble=NVT --precision=single
Platform: OpenCL
Precision: single
Test: amber20-dhfr
Ensemble: NVT
Step Size: 4 fs
Integrated 24362 steps in 58.1829 seconds
144.708 ns/day
python benchmark.py --platform=HIP --test amber20-dhfr --ensemble=NVT --precision=single
Platform: HIP
Test: amber20-dhfr
Ensemble: NVT
Step Size: 4 fs
Integrated 139960 steps in 59.9526 seconds
806.807 ns/day
I also ran a short molecular dynamics simulation for a protein of about 55k atom in a ligand system prepared with charmm gui default openmm parameters, I used charmm36m forcefield for the protein and opls for the ligand, I edited charmmgui default script to run HIP instead of OpenCL, I am now getting about 48ns/day in production run compared to my previous 20ns/day on OpenCL.
I tested the following linux kernels : 5.15.0-52-generic , 6.0.3-060003-generic on Ubuntu 20.04.5 my gpu is rx 6650xt
I ran this and this is the output
also here is some data that might help
Originally posted by @icaspell in https://github.com/StreamHPC/openmm-hip/issues/4#issuecomment-1288566897