StreamHPC / openmm-hip-old

6 stars 3 forks source link

I managed to install without cuda however it doesn't load hip and only loads opencl #5

Open icaspell opened 1 year ago

icaspell commented 1 year ago
    thank you very much I managed to install without cuda however it doesn't load hip and only loads opencl 

I tested the following linux kernels : 5.15.0-52-generic , 6.0.3-060003-generic on Ubuntu 20.04.5 my gpu is rx 6650xt

I ran this and this is the output

---Loaded---
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
 /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
 ---Failed---
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so: libamdhip64.so.5: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so: libamdhip64.so.5: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so: libamdhip64.so.5: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so: libamdhip64.so.5: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so: libamdhip64.so.5: cannot open shared object file: No such file or directory
 Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory

also here is some data that might help

 clinfo
Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 22.2.2 - kisak-mesa PPA
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3452.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     NAVI23 (navi23, LLVM 14.0.6, DRM 3.42, 5.15.0-52-generic)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 22.2.2 - kisak-mesa PPA
  Driver Version                                  22.2.2 - kisak-mesa PPA
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               32
  Max clock frequency                             2765MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
=== CL_PROGRAM_BUILD_LOG ===
fatal error: cannot open file '/usr/lib/clc/gfx1032-amdgcn-mesa-mesa3d.bc': No such file or directory
  Preferred work group size multiple              <getWGsizes:1200: create kernel : error -46>
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8589934592 (8GiB)
  Error Correction support                        No
  Max memory allocation                           2147483648 (2GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     16
  Max constant buffer size                        67108864 (64MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx1032
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 
  Driver Version                                  3452.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         (n/a)
  Device Topology (AMD)                           PCI-E, 03:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               16
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                32
  SIMD instruction width (AMD)                    1
  Max clock frequency                             2765MHz
  Graphics IP (AMD)                               10.3
  Device Partition                                (core)
    Max number of sub-devices                     16
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              32
  Wavefront width (AMD)                           32
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8573157376 (7.984GiB)
  Global free memory (AMD)                        8372224 (7.984GiB)
  Global memory channels (AMD)                    4
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           7287183768 (6.787GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    7287183768 (6.787GiB)
  Preferred total size of global vars             8573157376 (7.984GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             29679
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 8192 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x8192 pixels
    Max number of read image args                 128
    Max number of write image args                8
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    16
  Max pipe packet size                            2992216472 (2.787GiB)
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        7287183768 (6.787GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                262144 (256KiB)
    Max size                                      8388608 (8MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Number of P2P devices (AMD)                     0
  P2P devices (AMD)                               <printDeviceInfo:147: get number of CL_DEVICE_P2P_DEVICES_AMD : error -30>
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        0ns (Thu Jan  1 02:00:00 1970)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  No
    Number of async queues (AMD)                  8
    Max real-time compute queues (AMD)            8
    Max real-time compute units (AMD)             16
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   NAVI23 (navi23, LLVM 14.0.6, DRM 3.42, 5.15.0-52-generic)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   NAVI23 (navi23, LLVM 14.0.6, DRM 3.42, 5.15.0-52-generic)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   NAVI23 (navi23, LLVM 14.0.6, DRM 3.42, 5.15.0-52-generic)

Originally posted by @icaspell in https://github.com/StreamHPC/openmm-hip/issues/4#issuecomment-1288566897

ex-rzr commented 1 year ago

Can you show output of hipconfig and rocminfo?

icaspell commented 1 year ago
hipconfig
^[[3~HIP version  : 4.4.21432-f9dccde4

== hipconfig
HIP_PATH     : /opt/rocm-4.5.2/hip
ROCM_PATH    : /opt/rocm-4.5.2
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-4.5.2/hip/include -I/opt/rocm-4.5.2/llvm/bin/../lib/clang/13.0.0 -I/opt/rocm-4.5.2/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-4.5.2/hsa
HIP_CLANG_PATH   : /opt/rocm-4.5.2/llvm/bin
AMD clang version 13.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.5.2 21432 9bbd96fd1936641cd47defd8022edafd063019d5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-4.5.2/llvm/bin
AMD LLVM version 13.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-4.5.2/llvm/lib/clang/13.0.0/include/.." -isystem /opt/rocm-4.5.2/hsa/include -isystem "/opt/rocm-4.5.2/hip/include" -O3
hip-clang-ldflags  : --driver-mode=g++ -L"/opt/rocm-4.5.2/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/home/icaspell/miniconda3/bin:/home/icaspell/miniconda3/condabin:/home/icaspell/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname     : icaspell-B450M-S2H-V2
Linux icaspell-B450M-S2H-V2 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:    20.04
Codename:   focal

rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 PRO 4650G with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 PRO 4650G with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4308                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16246008(0xf7e4f8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16246008(0xf7e4f8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16246008(0xf7e4f8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29679(0x73ef)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2765                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             
ex-rzr commented 1 year ago

Thanks! Ok, I see the problem. The conda package is built with ROCm 5.3.0, it should work with other 5.x versions (as shared libs are linked to /opt/rocm/lib/libamdhip64.so.5). You have ROCm 4.5.2 which is 11 months old. Is it possible to update your system to more recent version?

icaspell commented 1 year ago

I updated to 5.2 and I still get the same error.

hipconfig
HIP version  : 5.2.21152-4b155a06

== hipconfig
HIP_PATH     : /opt/rocm-5.2.1
ROCM_PATH    : /opt/rocm-5.2.1
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.2.1/include -I/opt/rocm-5.2.1/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.2.1/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.2.1/hsa
HIP_CLANG_PATH   : /opt/rocm-5.2.1/llvm/bin
AMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.1 22204 50d6d5d5b608d2abd6af44314abc6ad20036af3b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.2.1/llvm/bin
AMD LLVM version 14.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-5.2.1/llvm/lib/clang/14.0.0/include/.." -isystem /opt/rocm-5.2.1/hsa/include -isystem "/opt/rocm-5.2.1/include" -O3
hip-clang-ldflags  :  -L"/opt/rocm-5.2.1/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/home/icaspell/miniconda3/envs/openmm-hip/bin:/home/icaspell/miniconda3/condabin:/home/icaspell/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

== Linux Kernel
Hostname     : icaspell-B450M-S2H-V2
Linux icaspell-B450M-S2H-V2 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:    20.04
Codename:   focal
rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 PRO 4650G with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 PRO 4650G with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4308                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16246004(0xf7e4f4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16246004(0xf7e4f4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16246004(0xf7e4f4) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29679(0x73ef)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2765                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

python -c "import openmm as mm; print('---Loaded---', *mm.pluginLoadedLibNames, '---Failed---', *mm.Platform.getPluginLoadFailures(), sep='\n')"
---Loaded---
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
---Failed---
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
ex-rzr commented 1 year ago

It looks better, actually. You only need to install hipfft: https://github.com/StreamHPC/openmm-hip#installing-with-conda

icaspell commented 1 year ago

I already did should I recreate the environment ?

sudo apt install hipfft
[sudo] password for icaspell: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
hipfft is already the newest version (1.0.8.50201-79).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

No difference

python -c "import openmm as mm; print('---Loaded---', *mm.pluginLoadedLibNames, '---Failed---', *mm.Platform.getPluginLoadFailures(), sep='\n')"
---Loaded---
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
---Failed---
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so: librocfft.so.0: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
python -m openmm.testInstallation

OpenMM Version: 8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a

There are 3 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 OpenCL - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.31765e-06
Reference vs. OpenCL: 6.74414e-06
CPU vs. OpenCL: 7.08274e-07
ex-rzr commented 1 year ago

Strange. It looks like hipfft does not depend on rocfft anymore so it's not automatically installed.

Can you check this?

sudo apt install rocfft

If it helps I'll update the instructions in README.

icaspell commented 1 year ago

it detects hip now but it outputs the following error

8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
python -c "import openmm as mm; print('---Loaded---', *mm.pluginLoadedLibNames, '---Failed---', *mm.Platform.getPluginLoadFailures(), sep='\n')"
---Loaded---
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCPU.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMPME.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaHIP.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMHipCompiler.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaOpenCL.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeReference.so
/home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaReference.so
---Failed---
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMRPMDCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMDrudeCUDA.so: libcuda.so.1: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMAmoebaCUDA.so: libcufft.so.10: cannot open shared object file: No such file or directory
Error loading library /home/icaspell/miniconda3/envs/openmm-hip/lib/plugins/libOpenMMCudaCompiler.so: libnvrtc.so.11.2: cannot open shared object file: No such file or directory
icaspell commented 1 year ago

I managed to make it work by setting the following environment variable export HSA_OVERRIDE_GFX_VERSION=10.3.0 apparently rx 6650xt is not officially supported by rocm that's why it was outputting this error

8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 HIP - Successfully computed forces
4 OpenCL - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.32123e-06
Reference vs. HIP: 6.75557e-06
CPU vs. HIP: 8.49803e-07
Reference vs. OpenCL: 6.74414e-06
CPU vs. OpenCL: 7.01652e-07
HIP vs. OpenCL: 5.06541e-07

All differences are within tolerance.

I will run some tests and put the results compared to my previous OpenCL results thanks

ex-rzr commented 1 year ago

It seems that gfx1032 is not officially supported by ROCm. And rocFFT does not build kernels for this architecture: https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CMakeLists.txt#L150

Since hipFFT/rocFFT is not used by default as FFT backend, and everything else should work on gfx1032 without issues, we can make hipFFT support optional in CMake and build conda package without it. We need to think about this approach.

ex-rzr commented 1 year ago

By the way, could you upload a log of AMD_LOG_LEVEL=4 python -m openmm.testInstallation WITHOUT your workaround with HSA_OVERRIDE_GFX_VERSION? I want to see when exactly it crashes.

icaspell commented 1 year ago

Sure

AMD_LOG_LEVEL=4 python -m openmm.testInstallation
:3:rocdevice.cpp            :416 : 4635510823 us: 41087: [tid:0x7f5165fac740] Initializing HSA stack.
:3:comgrctx.cpp             :33  : 4635539990 us: 41087: [tid:0x7f5165fac740] Loading COMGR library.
:3:rocdevice.cpp            :207 : 4635544107 us: 41087: [tid:0x7f5165fac740] Numa selects cpu agent[0]=0x55de380e1fb0(fine=0x55de380e1750,coarse=0x55de380e6f50) for gpu agent=0x55de380e7410
:3:rocdevice.cpp            :1611: 4635544517 us: 41087: [tid:0x7f5165fac740] HMM support: 1, xnack: 0, direct host access: 0

:4:rocdevice.cpp            :1918: 4635544804 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f504ac00000, size 0x101000
:4:rocdevice.cpp            :1918: 4635545188 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f504aa00000, size 0x101000
:4:runtime.cpp              :83  : 4635545515 us: 41087: [tid:0x7f5165fac740] init

OpenMM Version: 8.0
Git Revision: cf824381f13a88402b0f676fb7e910c8693f9a9a

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
:3:rocdevice.cpp            :416 : 4636803898 us: 41087: [tid:0x7f5165fac740] Initializing HSA stack.
:3:comgrctx.cpp             :33  : 4636803973 us: 41087: [tid:0x7f5165fac740] Loading COMGR library.
:3:rocdevice.cpp            :207 : 4636804034 us: 41087: [tid:0x7f5165fac740] Numa selects cpu agent[0]=0x55de380e1fb0(fine=0x55de380e1750,coarse=0x55de380e6f50) for gpu agent=0x55de380e7410
:3:rocdevice.cpp            :1611: 4636804339 us: 41087: [tid:0x7f5165fac740] HMM support: 1, xnack: 0, direct host access: 0

:4:rocdevice.cpp            :1918: 4636804409 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f504ad04000, size 0x28
:4:rocdevice.cpp            :1918: 4636806345 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f5036500000, size 0x101000
:4:rocdevice.cpp            :1918: 4636806774 us: 41087: [tid:0x7f5165fac740] Allocate hsa host memory 0x7f5036300000, size 0x101000
:4:rocdevice.cpp            :2054: 4636806925 us: 41087: [tid:0x7f5165fac740] Allocate hsa device memory 0x7f5034400000, size 0x100000
:4:runtime.cpp              :83  : 4636806936 us: 41087: [tid:0x7f5165fac740] init
:3:hip_context.cpp          :50  : 4636806943 us: 41087: [tid:0x7f5165fac740] Direct Dispatch: 1
:1:hip_code_object.cpp      :460 : 4636806981 us: 41087: [tid:0x7f5165fac740] hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp      :461 : 4636806988 us: 41087: [tid:0x7f5165fac740]   Devices:
:1:hip_code_object.cpp      :464 : 4636806994 us: 41087: [tid:0x7f5165fac740]     amdgcn-amd-amdhsa--gfx1032 - [Not Found]
:1:hip_code_object.cpp      :468 : 4636806999 us: 41087: [tid:0x7f5165fac740]   Bundled Code Objects:
:1:hip_code_object.cpp      :485 : 4636807006 us: 41087: [tid:0x7f5165fac740]     host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp      :483 : 4636807015 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx1030 - [code object v4 is amdgcn-amd-amdhsa--gfx1030]
:1:hip_code_object.cpp      :483 : 4636807022 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx803 - [code object v4 is amdgcn-amd-amdhsa--gfx803]
:1:hip_code_object.cpp      :483 : 4636807029 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx900:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx900:xnack-]
:1:hip_code_object.cpp      :483 : 4636807036 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx906:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx906:xnack-]
:1:hip_code_object.cpp      :483 : 4636807042 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx908:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx908:xnack-]
:1:hip_code_object.cpp      :483 : 4636807052 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx90a:xnack+ - [code object v4 is amdgcn-amd-amdhsa--gfx90a:xnack+]
:1:hip_code_object.cpp      :483 : 4636807059 us: 41087: [tid:0x7f5165fac740]     hipv4-amdgcn-amd-amdhsa--gfx90a:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx90a:xnack-]
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
icaspell commented 1 year ago

It manages to pass all the tests except the stochastic one.

./test_openmm_hip.sh

#1: TestHipAmoebaExtrapolatedPolarization
Done

#2: TestHipAmoebaGeneralizedKirkwoodForce
Done

#3: TestHipAmoebaMultipoleForce
Done

#4: TestHipAmoebaTorsionTorsionForce
Done

#5: TestHipAmoebaVdwForce
Done

#6: TestHipAndersenThermostat
Done

#7: TestHipBrownianIntegrator
Done

#8: TestHipCheckpoints
Done

#9: TestHipCMAPTorsionForce
Done

#10: TestHipCMMotionRemover
Done

#11: TestHipCompiler
Done

#12: TestHipCompoundIntegrator
Done

#13: TestHipCustomAngleForce
Done

#14: TestHipCustomBondForce
Done

#15: TestHipCustomCentroidBondForce
Done

#16: TestHipCustomCompoundBondForce
Done

#17: TestHipCustomCVForce
Done

#18: TestHipCustomExternalForce
Done

#19: TestHipCustomGBForce
Done

#20: TestHipCustomHbondForce
Done

#21: TestHipCustomIntegrator
exception: Assertion failure at TestCustomIntegrator.h:1162.  Expected 300, found 303.017 (This test is stochastic and may occasionally fail)
Done

#22: TestHipCustomManyParticleForce
Done

#23: TestHipCustomNonbondedForce
Done

#24: TestHipCustomTorsionForce
Done

#25: TestHipDispersionPME
Done

#26: TestHipDrudeForce
Done

#27: TestHipDrudeLangevinIntegrator
Done

#28: TestHipDrudeNoseHoover
Done

#29: TestHipDrudeSCFIntegrator
Done

#30: TestHipEwald
Done

#31: TestHipFFTImplFFT3D
Done

#32: TestHipFFTImplHipFFT
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
exception: Error executing hipFFT: 6

#33: TestHipFFTImplVkFFT
realToComplex: 0 xsize: 28 ysize: 25 zsize: 30
realToComplex: 1 xsize: 28 ysize: 25 zsize: 25
realToComplex: 1 xsize: 25 ysize: 28 zsize: 25
realToComplex: 1 xsize: 25 ysize: 25 zsize: 28
realToComplex: 1 xsize: 21 ysize: 25 zsize: 27
realToComplex: 1 xsize: 49 ysize: 98 zsize: 14
realToComplex: 1 xsize: 7 ysize: 21 zsize: 98
realToComplex: 1 xsize: 98 ysize: 21 zsize: 21
realToComplex: 1 xsize: 18 ysize: 98 zsize: 6
realToComplex: 1 xsize: 50 ysize: 50 zsize: 50
realToComplex: 1 xsize: 60 ysize: 60 zsize: 60
realToComplex: 0 xsize: 64 ysize: 64 zsize: 64
realToComplex: 1 xsize: 100 ysize: 100 zsize: 100
realToComplex: 1 xsize: 243 ysize: 120 zsize: 120
realToComplex: 1 xsize: 216 ysize: 216 zsize: 216
realToComplex: 1 xsize: 98 ysize: 98 zsize: 98
Done

#34: TestHipGayBerneForce
Done

#35: TestHipGBSAOBCForce
Done

#36: TestHipHarmonicAngleForce
Done

#37: TestHipHarmonicBondForce
Done

#38: TestHipHippoNonbondedForce
Done

#39: TestHipLangevinIntegrator
Done

#40: TestHipLangevinMiddleIntegrator
Done

#41: TestHipLocalEnergyMinimizer
Done

#42: TestHipMonteCarloAnisotropicBarostat
Done

#43: TestHipMonteCarloBarostat
Done

#44: TestHipMonteCarloFlexibleBarostat
Done

#45: TestHipMultipleForces
Done

#46: TestHipNonbondedForce
Done

#47: TestHipNoseHooverIntegrator
Done

#48: TestHipPeriodicTorsionForce
Done

#49: TestHipRandom
Done

#50: TestHipRBTorsionForce
Done

#51: TestHipRMSDForce
Done

#52: TestHipRpmd
Done

#53: TestHipSettle
Done

#54: TestHipSort
Done

#55: TestHipVariableLangevinIntegrator
Done

#56: TestHipVariableVerletIntegrator
Done

#57: TestHipVerletIntegrator
Done

#58: TestHipVirtualSites
Done

#59: TestHipWcaDispersionForce
Done
------------
Failed tests
------------

#32 TestHipFFTImplHipFFT

Here are my previous OpenCL benchmark

Platform: OpenCL
Precision: single

Test: gbsa
Ensemble: NVT
Step Size: 4 fs
Integrated 68989 steps in 52.7213 seconds
452.238 ns/day

Test: rf
Ensemble: NVT
Step Size: 4 fs
Integrated 22505 steps in 56.7139 seconds
137.14 ns/day

Test: pme (cutoff=0.9)
Ensemble: NVT
Step Size: 4 fs
Integrated 22461 steps in 57.937 seconds
133.982 ns/day

Test: apoa1rf
Ensemble: NVT
Step Size: 4 fs
Integrated 8627 steps in 61.5705 seconds
48.424 ns/day

Test: apoa1pme
Ensemble: NVT
Step Size: 4 fs
Integrated 8061 steps in 61.3665 seconds
45.3975 ns/day

Test: apoa1ljpme
Ensemble: NVT
Step Size: 4 fs
Integrated 8470 steps in 62.3557 seconds
46.9441 ns/day

Test: amoebagk (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 373 steps in 54.3781 seconds
1.1853 ns/day

Test: amoebapme (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 1355 steps in 53.4768 seconds
4.37842 ns/day

HIP benchmarks

python benchmark.py --platform HIP 
Platform: HIP

Test: gbsa
Ensemble: NVT
Step Size: 4 fs
Integrated 202477 steps in 59.2487 seconds
1181.06 ns/day

Test: rf
Ensemble: NVT
Step Size: 4 fs
Integrated 186565 steps in 59.9133 seconds
1076.17 ns/day

Test: pme (cutoff=0.9)
Ensemble: NVT
Step Size: 4 fs
Integrated 137783 steps in 60.3333 seconds
789.246 ns/day

Test: apoa1rf
Ensemble: NVT
Step Size: 4 fs
Integrated 53920 steps in 61.4199 seconds
303.399 ns/day

Test: apoa1pme
Ensemble: NVT
Step Size: 4 fs
Integrated 36012 steps in 60.8835 seconds
204.419 ns/day

Test: apoa1ljpme
Ensemble: NVT
Step Size: 4 fs
Integrated 29134 steps in 60.7964 seconds
165.614 ns/day

Test: amoebagk (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 7800 steps in 58.3635 seconds
23.0939 ns/day

Test: amoebapme (epsilon=1e-05)
Ensemble: NVT
Step Size: 2 fs
Integrated 3067 steps in 58.4128 seconds
9.07297 ns/day

The jump in performance is huge, are these numbers for real ? or is there is something wrong with it. Anyway thank you so much for this project this is literally the only software that can run molecular dynamics on AMD navi2 GPUs right now.

ex-rzr commented 1 year ago

Thanks!

The jump in performance is huge, are these numbers for real ? or is there is something wrong with it.

The results look consistent with the numbers from #1 and what we saw on our devices. But please report any problems with precision, stability and performance, the tests can't check all possible cases so "real-world" experience is highly appreciated.

Could you also run amber20-dhfr, amber20-cellulose and amber20-stmv? (use --test amber20-dhfr)

icaspell commented 1 year ago
python benchmark.py --platform OpenCL --test amber20-cellulose
Platform: OpenCL
Precision: single

Test: amber20-cellulose
Ensemble: NVT
Step Size: 4 fs
Integrated 1621 steps in 60.6103 seconds
9.24294 ns/day

python benchmark.py --platform=HIP --test amber20-cellulose --ensemble=NVT --precision=single
Platform: HIP

Test: amber20-cellulose
Ensemble: NVT
Step Size: 4 fs
Integrated 7930 steps in 60.8478 seconds
45.0404 ns/day
python benchmark.py --platform=OpenCL --test amber20-stmv --ensemble=NVT --precision=single
Platform: OpenCL
Precision: single

Test: amber20-stmv
Ensemble: NVT
Step Size: 4 fs
Integrated 464 steps in 58.7317 seconds
2.73035 ns/day

python benchmark.py --platform=HIP --test amber20-stmv --ensemble=NVT --precision=single
Platform: HIP

Test: amber20-stmv
Ensemble: NVT
Step Size: 4 fs
Integrated 2385 steps in 59.2904 seconds
13.902 ns/day

I had to install scipy to get dhfr running.

python benchmark.py --platform=OpenCL --test amber20-dhfr --ensemble=NVT --precision=single
Platform: OpenCL
Precision: single

Test: amber20-dhfr
Ensemble: NVT
Step Size: 4 fs
Integrated 24362 steps in 58.1829 seconds
144.708 ns/day

python benchmark.py --platform=HIP --test amber20-dhfr --ensemble=NVT --precision=single
Platform: HIP

Test: amber20-dhfr
Ensemble: NVT
Step Size: 4 fs
Integrated 139960 steps in 59.9526 seconds
806.807 ns/day

I also ran a short molecular dynamics simulation for a protein of about 55k atom in a ligand system prepared with charmm gui default openmm parameters, I used charmm36m forcefield for the protein and opls for the ligand, I edited charmmgui default script to run HIP instead of OpenCL, I am now getting about 48ns/day in production run compared to my previous 20ns/day on OpenCL.