ROCm / clr

MIT License
89 stars 47 forks source link

Fail to build shared libs and OpenCL not detected with static lib - libamdocl64.a: invalid ELF header #26

Closed squid-f closed 10 months ago

squid-f commented 10 months ago

Hi there I don't succeed to build rocm-clr with BUILD_SHARED_LIBS=ON

So, I used:

cmake \
    -Wno-dev \
    -S "." \
    -B build \
    -DCMAKE_C_COMPILER=%{install_prefix}/llvm/bin/clang \
    -DLLVM_DIR=%{install_prefix}/llvm/lib/cmake/llvm \
    -DCLR_BUILD_OCL=ON \
    -DROCM_PATH=%{_usr} \
    -DClang_DIR=%{install_prefix}/llvm/lib/cmake/clang \
    -DLLD_DIR=%{install_prefix}/llvm/lib/cmake/lld \
    -DBUILD_SHARED_LIBS=OFF

with install_prefix = /usr/lib64/rocm

The build runs fine and libamdocl64.a is created (no .so is created, obviously).

I created /etc/OpenCL/vendors/amdocl64.icd with /usr/lib64/rocm//libamdocl64.a into it.

rocminfo finds my GPU:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 5900X 12-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 5900X 12-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3700                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32782820(0x1f439e4) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32782820(0x1f439e4) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32782820(0x1f439e4) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1032                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6600                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
    L3:                      32768(0x8000) KB                   
  Chip ID:                 29695(0x73ff)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2750                               
  BDFID:                   2304                               
  Internal Node ID:        1                                  
  Compute Unit:            28                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 109                                
  SDMA engine uCode::      76                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1032         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

but clinfo from the ROCm package fails with:

dlerror: /usr/lib64/rocm/libamdocl64.a: invalid ELF header
ERROR: clGetPlatformIDs(-1001)

I am afraid to be in a catch22 situation: libamdocl64.so seems to be required in the amdocl64.icd but I can only get a libamdocl64.a from the build.

What is the way to get out of that? I have not been able to find any hints in https://rocm.docs.amd.com/en/latest/

Thanks for your support!

cjatin commented 10 months ago

Can you share the failure details you saw while building with SHARED_LIBS ON

squid-f commented 10 months ago

Can you share the failure details you saw while building with SHARED_LIBS ON

Hi. It is numerous undefined reference errors during the link step, like:

[100%] Linking CXX shared library libamdocl64.so
cd /builddir/build/BUILD/clr-rocm-5.7.1/build/opencl/amdocl && /usr/bin/cmake -E cmake_link_script CMakeFiles/amdocl.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC -O2 -g -pipe -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector --param=ssp-buffer-size=4 -fasynchronous-unwind-tables -DNDEBUG -pthread -Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id=sha1 -Wl,--enable-new-dtags -shared -Wl,-soname,libamdocl64.so -o libamdocl64.so CMakeFiles/amdocl.dir/cl_command.cpp.o CMakeFiles/amdocl.dir/cl_context.cpp.o CMakeFiles/amdocl.dir/cl_counter.cpp.o CMakeFiles/amdocl.dir/cl_d3d9.cpp.o CMakeFiles/amdocl.dir/cl_d3d10.cpp.o CMakeFiles/amdocl.dir/cl_d3d11.cpp.o CMakeFiles/amdocl.dir/cl_device.cpp.o CMakeFiles/amdocl.dir/cl_event.cpp.o CMakeFiles/amdocl.dir/cl_execute.cpp.o CMakeFiles/amdocl.dir/cl_gl.cpp.o CMakeFiles/amdocl.dir/cl_icd.cpp.o CMakeFiles/amdocl.dir/cl_kernel_info_amd.cpp.o CMakeFiles/amdocl.dir/cl_memobj.cpp.o CMakeFiles/amdocl.dir/cl_p2p_amd.cpp.o CMakeFiles/amdocl.dir/cl_pipe.cpp.o CMakeFiles/amdocl.dir/cl_platform_amd.cpp.o CMakeFiles/amdocl.dir/cl_profile_amd.cpp.o CMakeFiles/amdocl.dir/cl_program.cpp.o CMakeFiles/amdocl.dir/cl_sampler.cpp.o CMakeFiles/amdocl.dir/cl_sdi_amd.cpp.o CMakeFiles/amdocl.dir/cl_svm.cpp.o CMakeFiles/amdocl.dir/cl_thread_trace_amd.cpp.o  -Wl,-rpath,/usr/lib64/rocm/llvm/lib: -Wl,-Bsymbolic -Wl,--version-script=/builddir/build/BUILD/clr-rocm-5.7.1/opencl/amdocl/amdocl.map ../../rocclr/librocclr.a -lrt /usr/lib64/libamd_comgr.a /usr/lib64/rocm/llvm/lib/liblldELF.so.17git /usr/lib64/rocm/llvm/lib/liblldCommon.so.17git /usr/lib64/rocm/llvm/lib/libLLVMAMDGPUCodeGen.so.17git /usr/lib64/rocm/llvm/lib/libLLVMAMDGPUAsmParser.so.17git /usr/lib64/rocm/llvm/lib/libLLVMAMDGPUDisassembler.so.17git /usr/lib64/rocm/llvm/lib/libLLVMAMDGPUDesc.so.17git /usr/lib64/rocm/llvm/lib/libLLVMAMDGPUInfo.so.17git /usr/lib64/rocm/llvm/lib/libLLVMAMDGPUUtils.so.17git /usr/lib64/rocm/llvm/lib/libLLVMNVPTXCodeGen.so.17git /usr/lib64/rocm/llvm/lib/libLLVMNVPTXDesc.so.17git /usr/lib64/rocm/llvm/lib/libLLVMNVPTXInfo.so.17git /usr/lib64/rocm/llvm/lib/libLLVMX86CodeGen.so.17git /usr/lib64/rocm/llvm/lib/libLLVMX86AsmParser.so.17git /usr/lib64/rocm/llvm/lib/libLLVMX86Desc.so.17git /usr/lib64/rocm/llvm/lib/libLLVMX86Disassembler.so.17git /usr/lib64/rocm/llvm/lib/libLLVMX86Info.so.17git /usr/lib64/rocm/llvm/lib/libclangFrontendTool.so.17git /usr/lib64/rocm/llvm/lib/libLLVMSymbolize.so.17git /usr/lib64/rocm/llvm/lib/libLLVMDebugInfoDWARF.so.17git /usr/lib64/libhsa-runtime64.so.1.11.0 /usr/lib64/libnuma.so -Wl,-rpath-link,/usr/lib64/rocm/llvm/lib 
/usr/bin/ld: /usr/lib64/libamd_comgr.a(comgr-compiler.cpp.o): in function `std::default_delete<llvm::Module>::operator()(llvm::Module*) const [clone .part.0]':
(.text+0x55): undefined reference to `llvm::Module::~Module()'
/usr/bin/ld: /usr/lib64/libamd_comgr.a(comgr-compiler.cpp.o): in function `llvm::SmallVectorImpl<char>::operator=(llvm::SmallVectorImpl<char>&&) [clone .isra.0]':
(.text+0x198): undefined reference to `llvm::SmallVectorBase<unsigned long>::grow_pod(void*, unsigned long, unsigned long)'
/usr/bin/ld: /usr/lib64/libamd_comgr.a(comgr-compiler.cpp.o): in function `llvm::Error::operator=(llvm::Error&&) [clone .isra.0]':
(.text+0x2b5): undefined reference to `llvm::Error::fatalUncheckedError() const'
/usr/bin/ld: /usr/lib64/libamd_comgr.a(comgr-compiler.cpp.o): in function `llvm::SmallVectorImpl<char>::operator=(llvm::SmallVectorImpl<char> const&) [clone .isra.0]':
(.text+0x358): undefined reference to `llvm::SmallVectorBase<unsigned long>::grow_pod(void*, unsigned long, unsigned long)'
/usr/bin/ld: /usr/lib64/libamd_comgr.a(comgr-compiler.cpp.o): in function `COMGR::logArgv(llvm::raw_ostream&, llvm::StringRef, llvm::ArrayRef<char const*>)':
(.text+0x88c): undefined reference to `llvm::raw_ostream::write(char const*, unsigned long)'
/usr/bin/ld: (.text+0x8e7): undefined reference to `llvm::raw_ostream::write(char const*, unsigned long)'

I had to build ROCm-compilersupport with SHARED_LIBS OFF already: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/59

Full log here: rocm-amd-opencl-5.7.1-1.7-mga9-testing.log

Thanks for your support!

squid-f commented 10 months ago

Hi, sorry, I provided the wrong link regarind the issue leading to build ROCm-compilersupport witn SHARED_LIBS OFF, here it is: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/55

squid-f commented 10 months ago

I have now succeeded to build ROCm-compilersupport with SHARED_LIBS ON; explained in https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/55

Meaning, I can also now build rocm-clr with SHARED_LIBS ON

However, my RX6600 aka gfx1032 is not found to get opencl active. Here is the output of ROCm clinfo, showing 0 device found in the platform:

Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3590.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback 

  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               0

darktable-cltest confirms OpenCL is not active:

     0,0313 [dt_get_sysresource_level] switched to 1 as `default'
     0,0313   total mem:       32014MB
     0,0313   mipmap cache:    4001MB
     0,0313   available mem:   16007MB
     0,0313   singlebuff:      250MB
     0,0313   OpenCL tune mem: OFF
     0,0313   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] no devices found for Advanced Micro Devices, Inc. (vendor) - AMD Accelerated Parallel Processing (name)
[opencl_init] found 0 device
[opencl_init] FINALLY: opencl is NOT AVAILABLE and NOT ENABLED.

What am I still missing?

Thanks!

Note: romcinfo finds my GPU (output on the first post)

squid-f commented 10 months ago

Hi. By build rocm-llvm with SHARED_LIBS OFF, I have been able to build ROCm-compilersupport (https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/55) with SHARED_LIBS ON AND clr as well. By that, I have OpenCL activated with my GPU gfx1032 (RX6600). So, all is good now. Even if it is not explained why it was not working with static libs from ROCm-compilersupport and clr. Thanks all for your insights!