ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
339 stars 161 forks source link

fatal error: 'omp.h' file not found #1182

Closed paolodalberto closed 3 years ago

paolodalberto commented 3 years ago

What is the expected behavior

- pass compilation

What actually happens

- In file included from /home/paolo/FastMM/Epyc/rocBLAS/clients/common/blis_interface.cpp:5: /home/paolo/FastMM/Epyc/rocBLAS/build/deps/blis/include/blis/blis.h:18940:10: fatal error: 'omp.h' file not found

include // skipped

     ^~~~~~~

1 error generated when compiling for gfx803. make[2]: [clients/gtest/CMakeFiles/rocblas-test.dir/build.make:778: clients/gtest/CMakeFiles/rocblas-test.dir/__/common/blis_interface.cpp.o] Error 1 make[2]: Waiting for unfinished jobs.... /home/paolo/FastMM/Epyc/rocBLAS/clients/common/cblas_interface.cpp:7:10: fatal error: 'omp.h' file not found

include

     ^~~~~~~

1 error generated when compiling for gfx803. make[2]: *** [clients/gtest/CMakeFiles/rocblas-test.dir/build.make:765: clients/gtest/CMakeFiles/rocblas-test.dir/__/common/cblas_interface.cpp.o] Error 1

How to reproduce

- bash install.sh -c -a gfx803

Environment

paolo@fastmmw:~/FastMM/Epyc/rocBLAS$ /opt/rocm/bin/hipconfig --full HIP version : 4.0.20496-4f163c68

== hipconfig HIP_PATH : /opt/rocm-4.0.0/hip ROCM_PATH : /opt/rocm-4.0.0 HIP_COMPILER : clang HIP_PLATFORM : hcc HIP_RUNTIME : ROCclr CPP_CONFIG : -DHIP_PLATFORM_HCC= -I/opt/rocm-4.0.0/hip/include -I/opt/rocm-4.0.0/llvm/bin/../lib/clang/12.0.0 -I/opt/rocm-4.0.0/hsa/include -D__HIP_ROCclr__

== hip-clang HSA_PATH : /opt/rocm-4.0.0/hsa HIP_CLANG_PATH : /opt/rocm-4.0.0/llvm/bin clang version 12.0.0 (https://github.com/RadeonOpenCompute/llvm-project.git dac2bfceaa8d4a90257dc8a6d58f268e172ce00e) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm-4.0.0/llvm/bin LLVM (http://llvm.org/): LLVM version 12.0.0git Optimized build with assertions. Default target: x86_64-unknown-linux-gnu Host CPU: znver1

Registered Targets: amdgcn - AMD GCN GPUs r600 - AMD GPUs HD2XXX-HD6XXX x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 hip-clang-cxxflags : -DHIP_ROCclr -std=c++11 -isystem /opt/rocm-4.0.0/llvm/lib/clang/12.0.0/include/.. -isystem /opt/rocm-4.0.0/hsa/include -DHIP_ROCclr -isystem /opt/rocm-4.0.0/hip/include -O3 hip-clang-ldflags : -L/opt/rocm-4.0.0/hip/lib -O3 -lgcc_s -lgcc -lpthread -lm

=== Environment Variables PATH=/opt/rocm/llvm/bin:/home/paolo/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HIP_DIR=/home/paolo/FastMM/Epyc/HIP

== Linux Kernel Hostname : fastmmw Linux fastmmw 5.4.0-60-generic #67-Ubuntu SMP Tue Jan 5 18:31:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.1 LTS Release: 20.04

Hardware description
GPU device string
CPU device string
Software version
ROCK v0.0
ROCR v0.0
HCC v0.0
Library v0.0

Let me know if this help .... it takes about 10-15 minutes to pass 4% compilation ...

paolodalberto commented 3 years ago

more information about the architecture


paolo@fastmmw:~/FastMM/Epyc/rocBLAS$ /opt/rocm/bin/rocminfo 
ROCk module is loaded
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen Threadripper 1950X 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen Threadripper 1950X 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3400                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65775172(0x3eba644) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65775172(0x3eba644) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          Ellesmere [Radeon Pro WX 7100]     
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26564(0x67c4)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1243                               
  BDFID:                   17152                              
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16777216(0x1000000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          Ellesmere [Radeon Pro WX 7100]     
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26564(0x67c4)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1243                               
  BDFID:                   17408                              
  Internal Node ID:        2                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16777216(0x1000000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Ma
```x Size:       32                                 
*** Done ***   
paolodalberto commented 3 years ago

find /opt/rocm-3.10.0/ -name "omp.h" /opt/rocm-3.10.0/llvm/lib/clang/12.0.0/include/omp.h /opt/rocm-3.10.0/llvm/include/omp.h

however not found in find /opt/rocm-4.0.0/

/opt/rocm-4.0.0/llvm/ installed fresh ...

TorreZuk commented 3 years ago

Thanks for reporting @paolodalberto. We will have to dig into the dependency issues but while you wait you should be able to directly install the llvm-clang openmp-extras package openmp-extras4.0.0_12.10-0_amd64.deb to get the omp.h put in the 4.0.0 tree, the deb is at http://repo.radeon.com/rocm/apt/4.0/pool/main/o/openmp-extras4.0.0/openmp-extras4.0.0_12.10-0_amd64.deb

TorreZuk commented 3 years ago

Also @paolodalberto when you have time, you say "fresh" I you uninstalled 3.10 or on a clean machine? Did it require any special installation steps on your side?

cgmb commented 3 years ago

One other thing to check is what the /opt/rocm directory looks like. There now exist version-pinned rocm packages (e.g. rocm-dev4.0.0) and rolling version rocm packages (e.g. rocm-dev). However, I don't think you can mix-and-match them at the moment.

IIRC, the version-pinned packages don't make an /opt/rocm symlink. That may be important because I notice that the PATH is to /opt/rocm/llvm/bin rather than /opt/rocm-4.0.0/llvm/bin.

cgmb commented 3 years ago

It may or may not be of any help, but to check what packages I have installed from the rocm repositories, I use: aptitude search '~S ~i ~Orepo.radeon.com' (or aptitude search '~S ~i ~Orepo.radeon.com' -F '%c %M %p %d %v' to include the version numbers).

paolodalberto commented 3 years ago

I had to install from source the 4.0.0 llvm package. This installs clang. The missing link is not about rocBLAS (I think) but the installation of the dependencies.

I did not uninstall 3.10, I usually keep all previous .... and I am not comfortable to deleting removing packages (if not automatically)
I update from ununtu 18 to 20 (cmake issues) the /opt./rocm directory had kept most of the previous contents

this afternoon I will provide information about the /opt/rocm directory shape

paolodalberto commented 3 years ago

Also @paolodalberto when you have time, you say "fresh" I you uninstalled 3.10 or on a clean machine? Did it require any special installation steps on your side?

I followed the instruction at because my version 4.0 did not have the llvm package https://github.com/ROCm-Developer-Tools/HIP/blob/master/INSTALL.md#hip-clang

paolodalberto commented 3 years ago

One other thing to check is what the /opt/rocm directory looks like. There now exist version-pinned rocm packages (e.g. rocm-dev4.0.0) and rolling version rocm packages (e.g. rocm-dev). However, I don't think you can mix-and-match them at the moment.

IIRC, the version-pinned packages don't make an /opt/rocm symlink. That may be important because I notice that the PATH is to /opt/rocm/llvm/bin rather than /opt/rocm-4.0.0/llvm/bin.

out.txt this is my tree for /opt/

paolo@fastmmw:~$ aptitude search '~S ~i ~Orepo.radeon.com'
i   comgr                                              - Library to provide support functions                         
i   half                                               - HALF-PRECISION FLOATING POINT LIBRARY                        
i A hip-base                                           - HIP: Heterogenous-computing Interface for Portability [BASE] 
i A hip-doc                                            - HIP: Heterogenous-computing Interface for Portability [DOCUME
i   hip-rocclr                                         - HIP: Heterogenous-computing Interface for Portability [ROCClr
i A hip-samples                                        - HIP: Heterogenous-computing Interface for Portability [SAMPLE
i A hsa-amd-aqlprofile                                 - AQLPROFILE library for AMD HSA runtime API extension support 
i A hsa-rocr-dev                                       - AMD Heterogeneous System Architecture HSA - Linux HSA Runtime
i A hsakmt-roct                                        - HSAKMT library for AMD KFD support                           
i A hsakmt-roct-dev                                    - HSAKMT development package.                                  
i   llvm-amdgpu                                        - amdgpu backend                                               
i   mivisionx                                          - AMD MIVisionX toolkit is a comprehensive computer vision and 
i A openmp-extras                                      - OpenMP Extras provides openmp and flang libraries.           
i   rocblas                                            - rocBLAS is AMD's library for BLAS on ROCm (Radeon Open Comput
i A rock-dkms                                          - amdgpu driver in DKMS format.                                
i A rock-dkms-firmware                                 - firmware blobs used by amdgpu driver in DKMS format          
i A rocm-clang-ocl                                     - OpenCL compilation with clang compiler.                      
i A rocm-cmake                                         - rocm-cmake built using CMake                                 
i A rocm-dbgapi                                        - Library to provide AMD GPU debugger API                      
i   rocm-dev                                           - Radeon Open Compute (ROCm) Runtime software stack            
i A rocm-device-libs                                   - Radeon Open Compute - device libraries                       
i   rocm-dkms                                          - Radeon Open Compute (ROCm) Runtime software stack            
i A rocm-gdb                                           - ROCgdb                                                       
i   rocm-opencl                                        - OpenCL: Open Computing Language on ROCclr                    
i A rocm-opencl-dev                                    - OpenCL: Open Computing Language on ROCclr                    
i A rocm-smi                                           - System Management Interface for ROCm                         
i A rocm-smi-lib64                                     - AMD System Management libraries                              
i   rocm-utils                                         - Radeon Open Compute (ROCm) Runtime software stack            
i A rocminfo                                           - Radeon Open Compute (ROCm) Runtime rocminfo tool             
i   rocprim                                            - Radeon Open Compute Parallel Primitives Library              
i A rocprofiler-dev                                    - ROCPROFILER library for AMD HSA runtime API extension support
i A roctracer-dev                                      - AMD ROCTRACER library      
TorreZuk commented 3 years ago

Yes the 4.0 expects the openmp-extras 4.0 installed so you can install the deb I provided the link to. They will have to add the instructions for building openmp-extras from source to the HIP-Clang site you pointed to as we don't do that manually in rocBLAS.
The rocm-dkms 4.0 installation should provide all the llvm and openmp-extras as it did on my Ubuntu20 test.

paolodalberto commented 3 years ago

let me reinstall the package openmp-extras

paolodalberto commented 3 years ago

reinstalling the openmp-extras will be creating the correct includes. alas something else fails in compilation

[ 73%] Built target example-c-dgeam
[ 73%] Building CXX object clients/gtest/CMakeFiles/rocblas-test.dir/atomics_mode_gtest.cpp.o
[ 74%] Building CXX object clients/gtest/CMakeFiles/rocblas-test.dir/gemm_gtest.cpp.o
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/multiheaded_gtest.cpp:219:5: error: unknown type name 'quick'
    INSTANTIATE_TEST_CATEGORIES(multiheaded);
    ^
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/../include/rocblas_test.hpp:154:42: note: expanded from macro 'INSTANTIATE_TEST_CATEGORIES'
    INSTANTIATE_TEST_CATEGORY(testclass, quick)       \
                                         ^
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/multiheaded_gtest.cpp:219:5: error: parameter type '(anonymous namespace)::multiheaded' is an abstract class
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/../include/rocblas_test.hpp:154:5: note: expanded from macro 'INSTANTIATE_TEST_CATEGORIES'
    INSTANTIATE_TEST_CATEGORY(testclass, quick)       \
    ^
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/../include/rocblas_test.hpp:143:39: note: expanded from macro 'INSTANTIATE_TEST_CATEGORY'
                             testclass,                                                           \
                                      ^
/usr/local/include/gtest/gtest.h:484:16: note: unimplemented pure virtual method 'TestBody' in 'multiheaded'
  virtual void TestBody() = 0;
               ^
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/multiheaded_gtest.cpp:219:5: error: no type named 'ValuesIn' in namespace 'testing'
    INSTANTIATE_TEST_CATEGORIES(multiheaded);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/../include/rocblas_test.hpp:154:5: note: expanded from macro 'INSTANTIATE_TEST_CATEGORIES'
    INSTANTIATE_TEST_CATEGORY(testclass, quick)       \
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/../include/rocblas_test.hpp:144:39: note: expanded from macro 'INSTANTIATE_TEST_CATEGORY'
                             testing::ValuesIn(RocBLAS_TestData::begin([](const Arguments& arg) { \
                             ~~~~~~~~~^
/home/paolo/FastMM/Epyc/rocBLAS/clients/gtest/multiheaded_gtest.cpp:219:5: error: C++ requires a type specifier for all declarations
TorreZuk commented 3 years ago

Did you do an install.sh -dc to install the dependencies for clients. googletest must now be using their branch release-1.10.0 Other dependencies may also be required which are different than earlier releases. You may want to delete your build tree and then do the install -dc. I am just guessing based on what these errors suggest.

paolodalberto commented 3 years ago

bash install.sh -c -a gfx803 The installation works ... if I do not specify the architecture it fails for gfx908. however

clients/staging/./rocblas-bench -f gemm -r f32_r --transposeA N --transposeB N -m 4096 -n 4096 -k 4096 --alpha 1 --lda 4096 --ldb 4096 --beta 0 --ldc 4096 --device 1
Query device success: there are 2 devices
-------------------------------------------------------------------------------
Device ID 0 : Ellesmere [Radeon Pro WX 7100]
with 17.2 GB memory, max. SCLK 1243 MHz, max. MCLK 1750 MHz, compute capability 8.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------
Device ID 1 : Ellesmere [Radeon Pro WX 7100]
with 17.2 GB memory, max. SCLK 1243 MHz, max. MCLK 1750 MHz, compute capability 8.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------

/src/external/hip-on-vdi/rocclr/hip_fatbin.cpp:39: guarantee(false && "Cannot unmap file")
Aborted (core dumped)
paolodalberto commented 3 years ago

I know I can install rocblas using apt but it is nice to understand the process ... rocBLAS will be used in combination with rocSPARSE. I can now use rocALUTION (rocBLAS installed by apt and rocSPARSE by script and rocRAND by partial script).

paolodalberto commented 3 years ago

but it will be nice to have code optimized for the architecture.

TorreZuk commented 3 years ago

You can see the architecture list used without -a in the root level CMakeLists.txt, and make sure you are building branch rocm-4.0.x if using 4.0 release hip and clang compilers. Sorry things are overly coupled right now as the compiler changes quickly.

paolodalberto commented 3 years ago

As today I could build the library bash install.sh -c -a gfx803 however

./rocblas-bench -f gemm -r f32_r --transposeA N --transposeB N -m 4096 -n 4096 -k 4096 --alpha 1 --lda 4096 --ldb 4096 --beta 0 --ldc 4096 --device 1
Query device success: there are 2 devices
-------------------------------------------------------------------------------
Device ID 0 : Ellesmere [Radeon Pro WX 7100]
with 17.2 GB memory, max. SCLK 1243 MHz, max. MCLK 1750 MHz, compute capability 8.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------
Device ID 1 : Ellesmere [Radeon Pro WX 7100]
with 17.2 GB memory, max. SCLK 1243 MHz, max. MCLK 1750 MHz, compute capability 8.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------

/src/external/hip-on-vdi/rocclr/hip_fatbin.cpp:39: guarantee(false && "Cannot unmap file")
Aborted (core dumped)
TorreZuk commented 3 years ago

That is the type of error I would expect if the architecture was mismatched so can you confirm you are using a clean rocblas rocm-4.0.x branch code base. Did you do a install -d or -dc to get all the dependencies as I asked previously? Otherwise I would be looking for clues as to what is different, are there any warning messages during compile? If you don't include -a can you build for all architectures and gfx908 runtime fails or you meant the build fails (if so what error) ?

paolodalberto commented 3 years ago

I can try again with -d .... git pull and bash install.sh -cd -a gfx803

paolodalberto commented 3 years ago

The building is successful using the above install however the bechmark still does not collaborate

aolo@fastmmw:~/FastMM/Epyc/rocBLAS/build/release/clients/staging$ ./rocblas-bench -f gemm -r f32_r --transposeA N --transposeB N -m 4096 -n 4096 -k 4096 --alpha 1 --lda 4096 --ldb 4096 --beta 0 --ldc 4096 --device 1
Query device success: there are 2 devices
-------------------------------------------------------------------------------
Device ID 0 : Ellesmere [Radeon Pro WX 7100] gfx803
with 17.2 GB memory, max. SCLK 1243 MHz, max. MCLK 1750 MHz, compute capability 8.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------
Device ID 1 : Ellesmere [Radeon Pro WX 7100] gfx803
with 17.2 GB memory, max. SCLK 1243 MHz, max. MCLK 1750 MHz, compute capability 8.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------------

/src/external/hip-on-vdi/rocclr/hip_fatbin.cpp:39: guarantee(false && "Cannot unmap file")
Aborted (core dumped)
paolodalberto commented 3 years ago

I removed completely the build directory before the install ... clearly there is something off ... May be next time I upgrade the machine this will go away.

If you do not mind I will keep the issue open.

cgmb commented 3 years ago

When debugging those sorts of 'guarantee' failures, I sometimes find that setting the AMD_LOG_LEVEL environment variable to 3 or 4 can help clarify what was wrong. (The various environment variables listed in the system level debug documentation are also sometimes useful, though perhaps not applicable to this case.)

paolodalberto commented 3 years ago

I will uninstall rocm and reinstall and redo

paolodalberto commented 3 years ago

llvm-clang I cannot install using apt installl ...

TorreZuk commented 3 years ago

llvm-clang I cannot install using apt installl ...

Aren't you installing rocm-dkms which will provide the llvm and clang ?

paolodalberto commented 3 years ago

yep .. .cleaned up the /opt/rocm https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html re-installing everything from scratch

paolodalberto commented 3 years ago

llvm is not present in /opt/rocm otherwise

TorreZuk commented 3 years ago

sudo apt install rocm-dkms should provide the /opt/rocm/llvm as that is the version of clang/llvm you need to build rocblas with.
Were there error messages during that install?

paolodalberto commented 3 years ago

nope

paolodalberto commented 3 years ago

and today I cannot make llvm

paolodalberto commented 3 years ago

let me autoremove everything and try again

paolodalberto commented 3 years ago

this is painful

paolodalberto commented 3 years ago
pdate-initramfs: Generating /boot/initrd.img-5.4.0-64-generic
Setting up rocm-smi (1.0.0-206-rocm-rel-4.0-23-ge39c0e2) ...
Setting up rocm-dbgapi (0.42.0.40000-23) ...
Setting up libelf-dev:amd64 (0.176-1.1build1) ...
Setting up rocm-opencl (3.6Beta-17-g875c1f8-rocm-rel-4.0-23) ...
Setting up hsakmt-roct-dev (20201016.1.0269-mainline-20201016-1-g0269ce3) ...
Setting up hip-doc (4.0.20496.5685.40000-23) ...
Setting up libtinfo5:amd64 (6.2-0ubuntu2) ...
Setting up rocm-opencl-dev (3.6Beta-17-g875c1f8-rocm-rel-4.0-23) ...
Setting up libncurses5:amd64 (6.2-0ubuntu2) ...
Setting up rocm-gdb (10.1-rocm-rel-4.0-23) ...
Setting up rocm-clang-ocl (0.5.0.64-rocm-rel-4.0-23-50fb51a) ...
Setting up rocm-utils (4.0.0.40000-23) ...
Setting up rocm-dev (4.0.0.40000-23) ...
Setting up rocm-dkms (4.0.0.40000-23) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.1) ...
root@fastmmw:/home/paolo# ls /opt/rocm
rocm/       rocm-4.0.0/ 
root@fastmmw:/home/paolo# ls /opt/rocm/
bin  hip  hsa-amd-aqlprofile  include  lib  oam  opencl  rocm_smi  rocprofiler  roctracer  share
root@fastmmw:/home/paolo# ls -lrt /opt/rocm/
total 44
drwxr-xr-x 4 root root 4096 Jan 21 11:28 hip
drwxr-xr-x 3 root root 4096 Jan 21 11:28 hsa-amd-aqlprofile
drwxr-xr-x 5 root root 4096 Jan 21 11:28 opencl
drwxr-xr-x 9 root root 4096 Jan 21 11:28 share
drwxr-xr-x 4 root root 4096 Jan 21 11:28 oam
drwxr-xr-x 6 root root 4096 Jan 21 11:28 rocm_smi
drwxr-xr-x 6 root root 4096 Jan 21 11:28 rocprofiler
drwxr-xr-x 2 root root 4096 Jan 21 11:28 bin
drwxr-xr-x 4 root root 4096 Jan 21 11:28 include
drwxr-xr-x 5 root root 4096 Jan 21 11:28 roctracer
drwxr-xr-x 3 root root 4096 Jan 21 11:30 lib

after installation now rebooting

paolodalberto commented 3 years ago
/opt/rocm/bin/rocminfo
bash: /opt/rocm/bin/rocminfo: No such file or directory
paolo@fastmmw:~$ /opt/rocm/opencl/bin/clinfo
dlerror: libamd_comgr.so.1: cannot open shared object file: No such file or directory
dlerror: libamd_comgr.so.1: cannot open shared object file: No such file or directory
dlerror: libamd_comgr.so.1: cannot open shared object file: No such file or directory
dlerror: libamd_comgr.so.1: cannot open shared object file: No such file or directory
ERROR: clGetPlatformIDs(-1001)

I know that opencl was always a problem but rocinfo is not even installed

paolodalberto commented 3 years ago

bash install.sh -cd -a gfx803


PREFIX=/opt/rocm /home/paolo/FastMM/Epyc/rocBLAS
CMake Error at /usr/share/cmake-3.16/Modules/CMakeDetermineCXXCompiler.cmake:48 (message):
  Could not find compiler set in environment variable CXX:

  hipcc.

Call Stack (most recent call first):
  CMakeLists.txt:22 (project)

CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
paolodalberto commented 3 years ago

building llvm and then hip

paolodalberto commented 3 years ago

https://rocmdocs.amd.com/en/latest/Installation_Guide/HIP-Installation.html

the instructions are no good

TorreZuk commented 3 years ago

I agree the instructions are weak as don't cover any potential problems, I would just go back to trying a clean install the rocm-dkms which contains all the hip and clang that we use to build rocBLAS as part of the install. As you just want to try and build rocBLAS correct? I just installed on a clean latest ubuntu 5.4.0-64 docker and see the /opt/rocm/llvm was installed so am guessing your package config or local packages are not all cleared out so messing up the install. Here are some instructions on force cleaning any old installation that might be causing trouble (drop sudo if you are root) if you are willing.

sudo apt-get autoremove rocm-opencl

check for /opt/rocm contents. There shouldn't be any files/folders present under it. If present, clean uninstall has not happened. try to clean uninstall using dpkg (sudo dpkg --purge rocm-dkms rock-dkms rock-dkms-firmware rocm-dev).

*If there is no content present, it implied clean uninstall .

sudo rm -rf /var/cache/apt/*

sudo apt-get clean all

sudo reboot

sudo dpkg --purge rocm-dev rocm-libs miopen-hip rocblas hipblas rocrand rocfft miopengemm comgr hip-base hip-doc hip-rocclr hip-samples hsa-amd-aqlprofile hsakmt-roct hsakmt-roct-devel hsa-rocr-dev llvm-amdgpu rock-dkms rock-dkms-firmware rocm-clang-ocl rocm-cmake rocm-dbgapi rocm-dev rocm-device-libs rocm-dkms rocm-gdb rocminfo rocm-opencl rocm-opencl-devel rocm-smi rocm-smi-lib64 rocm-utils rocprofiler-dev roctracer-dev

sudo dpkg -l | grep <> ; for example : sudo dpkg l | grep rock-dkms

sudo rm -rf /var/cache/apt/*

sudo apt-get clean all

wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -

echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

sudo apt-get -y update

sudo apt-get -y install rocm-dkms

/opt/rocm/llvm should now exists after this has been installed, you could try to build the rocBLAS 4.0.x branch of install the rocblas from package. rocBLAS should then get the correct version of the toolchain.

paolodalberto commented 3 years ago

I want to have a clean installation because I want to make sure the performance I will get are "complete" ... and I will used this in combination with rocALUTION ... it will be nice to have clear list of algorithms

paolodalberto commented 3 years ago

let me follow your instructions ....

paolodalberto commented 3 years ago

cleaning is important indeed ...

paolodalberto commented 3 years ago
ls /opt/rocm-4.0.0/
amdgcn  hip  hsa-amd-aqlprofile  lib   oam     rocm_smi     roctracer
bin     hsa  include             llvm  opencl  rocprofiler  share
paolodalberto commented 3 years ago

 The C++ compiler

    "/opt/rocm/bin/hipcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/paolo/FastMM/Epyc/rocBLAS/build/release/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make cmTC_dcf9a/fast && /usr/bin/make -f CMakeFiles/cmTC_dcf9a.dir/build.make CMakeFiles/cmTC_dcf9a.dir/build
    make[1]: Entering directory '/home/paolo/FastMM/Epyc/rocBLAS/build/release/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_dcf9a.dir/testCXXCompiler.cxx.o
    /opt/rocm/bin/hipcc     -o CMakeFiles/cmTC_dcf9a.dir/testCXXCompiler.cxx.o -c /home/paolo/FastMM/Epyc/rocBLAS/build/release/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    Can't exec "/opt/rocm-4.0.0/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.0/hip/bin/hipconfig line 141.
    Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.0/hip/bin/hipconfig line 142.
    Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.0/hip/bin/hipconfig line 145.
    Can't exec "/opt/rocm-4.0.0/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.0/hip/bin/hipconfig line 141.
    Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.0/hip/bin/hipconfig line 142.
    Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.0/hip/bin/hipconfig line 145.
    Can't exec "/opt/rocm-4.0.0/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.0/hip/bin/hipconfig line 141.
    Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.0/hip/bin/hipconfig line 142.
    Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.0/hip/bin/hipconfig line 145.
    Can't exec "/opt/rocm-4.0.0/llvm/bin/clang++": No such file or directory at /opt/rocm-4.0.0/hip/bin/hipconfig line 141.
    Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm-4.0.0/hip/bin/hipconfig line 142.
    Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm-4.0.0/hip/bin/hipconfig line 145.
    Can't exec "/opt/rocm-4.0.0/llvm/bin/clang": No such file or directory at /opt/rocm/bin/hipcc line 203.
    Use of uninitialized value $HIP_CLANG_VERSION in pattern match (m//) at /opt/rocm/bin/hipcc line 204.
    Use of uninitialized value $HIP_CLANG_VERSION in concatenation (.) or string at /opt/rocm/bin/hipcc line 208.
    Can't exec "/opt/rocm-4.0.0/llvm/bin/clang": No such file or directory at /opt/rocm/bin/hipcc line 895.
    failed to execute: No such file or directory
    make[1]: *** [CMakeFiles/cmTC_dcf9a.dir/build.make:66: CMakeFiles/cmTC_dcf9a.dir/testCXXCompiler.cxx.o] Error 255
    make[1]: Leaving directory '/home/paolo/FastMM/Epyc/rocBLAS/build/release/CMakeFiles/CMakeTmp'
    make: *** [Makefile:121: cmTC_dcf9a/fast] Error 2

  CMake will not be able to correctly generate this project.

Call Stack (most recent call first):
  CMakeLists.txt:22 (project)

-- Configuring incomplete, errors occurred!
See also "/home/paolo/FastMM/Epyc/rocBLAS/build/release/CMakeFiles/CMakeOutput.log".
See also "/home/paolo/FastMM/Epyc/rocBLAS/build/release/CMakeFiles/CMakeError.log".
+ check_exit_code 1
+ ((  1 != 0  ))
+ exit 1
paolodalberto commented 3 years ago
ls -lrt /opt/rocm-4.0.0/llvm/bin/
total 6104
-rwxr-xr-x 1 root root 2178688 Dec 14 03:01 flang2
-rwxr-xr-x 1 root root 4069440 Dec 14 03:01 flang1
paolodalberto commented 3 years ago

this mean that llvm installation is still incomplete ?

paolodalberto commented 3 years ago

BTW: this is not a docker ...

TorreZuk commented 3 years ago

Did it make the symlink /opt/rocm -> /opt/rocm-4.0.0 ? You are having the worst luck I have seen, yes that llvm should have lots of files (109 clang etc.) Those two flang files are correct... are you sure you aren't running out of disk space?
Sorry I only have a docker to spare right now but thought I did put 4.0.0 directly on a machine, I can ask around.

paolodalberto commented 3 years ago

The link is there .... I rebuilt the llvm project from source and now trying to re-install rocBLAS

paolodalberto commented 3 years ago

I should create a docker .. it is safer ... but at the same time it is yet another layer ...