fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

Vega64 + ROCm 1.7: Illegal instruction detected: Operand has incorrect register class. #1485

Closed chron0 closed 6 years ago

chron0 commented 6 years ago

xmr-stak fails during compilation of the opencl code with error: Illegal instruction detected: Operand has incorrect register class. In order to test if this is a kernel/driver issue, I've tried https://github.com/genesismining/sgminer-gm, which works. Is there any way to let xmr-stak be more verbose about the compilation step to figure out why and where it is failing there?

@gstoner, @justXi: do you have any ideas from ROCm perspective?
@justXi: Thanks for the ebuild submissions - no issues during emerge

voyager /opt/xmr-stak/build9/bin # ./xmr-stak --noCPU --noNVIDIA                                                                                                                                                                                                     
-------------------------------------------------------------------                                                                                                                                                                                                  
xmr-stak 2.4.3 26a5d65                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
-------------------------------------------------------------------                                                                                                                                                                                                  
[2018-04-22 08:31:16] : Mining coin: monero7                                                                                                                                                                                                                         
[2018-04-22 08:31:16] : Compiling code and initializing GPUs. This will take a while...                                                                                                                                                                              
[2018-04-22 08:31:16] : Device 0 work size 8 / 32.                                                                                                                                                                                                                   
[2018-04-22 08:31:16] : OpenCL device 0 - Precompiled code /root/.openclcache/c5bddd8e20cae2624555ebaf2d7e44155ccecd7abdb00a1e22a7ea711f26e927.openclbin not found. Compiling ...                                                                                    
error: Illegal instruction detected: Operand has incorrect register class.                                                                                                            

Basic information

Autodeteced amd.conf values

"gpu_threads_conf" : [
  // gpu: gfx900 memory:6821
  // compute units: 64
  { "index" : 0,
    "intensity" : 1536, "worksize" : 8,
    "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2,
    "comp_mode" : true
  },

],

/*
 * Platform index. This will be 0 unless you have different OpenCL platform - eg. AMD and Intel.
 */
"platform_index" : 0,

Build Trace

voyager /opt/xmr-stak/build9 # cmake ..                                                                                                                                                                                                                              
-- The C compiler identification is GNU 6.4.0                                                                                                                                                                                                                        
-- The CXX compiler identification is GNU 6.4.0                                                                                                                                                                                                                      
-- Check for working C compiler: /usr/bin/cc                                                                                                                                                                                                                         
-- Check for working C compiler: /usr/bin/cc -- works                                                                                                                                                                                                                
-- Detecting C compiler ABI info                                                                                                                                                                                                                                     
-- Detecting C compiler ABI info - done                                                                                                                                                                                                                              
-- Detecting C compile features                                                                                                                                                                                                                                      
-- Detecting C compile features - done                                                                                                                                                                                                                               
-- Check for working CXX compiler: /usr/bin/c++                                                                                                                                                                                                                      
-- Check for working CXX compiler: /usr/bin/c++ -- works                                                                                                                                                                                                             
-- Detecting CXX compiler ABI info                                                                                                                                                                                                                                   
-- Detecting CXX compiler ABI info - done                                                                                                                                                                                                                            
-- Detecting CXX compile features                                                                                                                                                                                                                                    
-- Detecting CXX compile features - done                                                                                                                                                                                                                             
-- Looking for pthread.h                                                                                                                                                                                                                                             
-- Looking for pthread.h - found                                                                                                                                                                                                                                     
-- Looking for pthread_create                                                                                                                                                                                                                                        
-- Looking for pthread_create - not found                                                                                                                                                                                                                            
-- Looking for pthread_create in pthreads                                                                                                                                                                                                                            
-- Looking for pthread_create in pthreads - not found                                                                                                                                                                                                                
-- Looking for pthread_create in pthread                                                                                                                                                                                                                             
-- Looking for pthread_create in pthread - found                                                                                                                                                                                                                     
-- Found Threads: TRUE                                                                                                                                                                                                                                               
-- Found CUDA: /opt/cuda (found suitable version "9.1", minimum required is "7.5")                                                                                                                                                                                   
-- Looking for CL_VERSION_2_0                                                                                                                                                                                                                                        
-- Looking for CL_VERSION_2_0 - found                                                                                                                                                                                                                                
-- Found OpenCL: /usr/lib/libOpenCL.so (found version "2.0")                                                                                                                                                                                                         
-- Found OpenSSL: /usr/lib64/libcrypto.so (found version "1.0.2o")                                                                                                                                                                                                   
-- Configuring done                                                                                                                                                                                                                                                  
-- Generating done                                                                                                                                                                                                                                                   
-- Build files have been written to: /opt/xmr-stak/build9                                                                                                                                                                                                            
voyager /opt/xmr-stak/build9 # make -j4                                                                                                                                                                                                                              
Scanning dependencies of target xmr-stak-c                                                                                                                                                                                                                           
[  5%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_blake256.c.o                                                                                                                                                                         
[  5%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_groestl.c.o                                                                                                                                                                          
[  8%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_jh.c.o                                                                                                                                                                               
[ 11%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_keccak.c.o                                                                                                                                                                           
[ 14%] Building C object CMakeFiles/xmr-stak-c.dir/xmrstak/backend/cpu/crypto/c_skein.c.o                                                                                                                                                                            
[ 17%] Linking C static library bin/libxmr-stak-c.a                                                                                                                                                                                                                  
[ 17%] Built target xmr-stak-c                                                                                                                                                                                                                                       
Scanning dependencies of target xmr-stak-backend                                                                                                                                                                                                                     
[ 20%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/jconf.cpp.o                                                                                                                                                                                       
[ 22%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/version.cpp.o                                                                                                                                                                                     
[ 25%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/hwlocMemory.cpp.o                                                                                                                                                                     
[ 28%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/jconf.cpp.o                                                                                                                                                                           
[ 31%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/minethd.cpp.o                                                                                                                                                                         
[ 34%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/backendConnector.cpp.o                                                                                                                                                                    
[ 37%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/globalStates.cpp.o                                                                                                                                                                        
[ 40%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/backend/cpu/crypto/cryptonight_common.cpp.o                                                                                                                                                       
[ 42%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/http/httpd.cpp.o                                                                                                                                                                                  
[ 45%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/http/webdesign.cpp.o                                                                                                                                                                              
[ 48%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/console.cpp.o                                                                                                                                                                                
[ 51%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/executor.cpp.o                                                                                                                                                                               
[ 54%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/telemetry.cpp.o                                                                                                                                                                              
[ 57%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/uac.cpp.o                                                                                                                                                                                    
[ 60%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/misc/utility.cpp.o                                                                                                                                                                                
[ 62%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/net/jpsock.cpp.o                                                                                                                                                                                  
[ 65%] Building CXX object CMakeFiles/xmr-stak-backend.dir/xmrstak/net/socket.cpp.o                                                                                                                                                                                  
[ 68%] Linking CXX static library bin/libxmr-stak-backend.a                                                                                                                                                                                                          
[ 68%] Built target xmr-stak-backend                                                                                                                                                                                                                                 
[ 74%] Building NVCC (Device) object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/nvcc_code/xmrstak_cuda_backend_generated_cuda_extra.cu.o                                                                                                             
[ 74%] Building NVCC (Device) object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/nvcc_code/xmrstak_cuda_backend_generated_cuda_core.cu.o                                                                                                              
Scanning dependencies of target xmrstak_opencl_backend                                                                                                                                                                                                               
Scanning dependencies of target xmr-stak                                                                                                                                                                                                                             
[ 77%] Building CXX object CMakeFiles/xmr-stak.dir/xmrstak/cli/cli-miner.cpp.o                                                                                                                                                                                       
[ 80%] Building CXX object CMakeFiles/xmrstak_opencl_backend.dir/xmrstak/backend/amd/amd_gpu/gpu.cpp.o                                                                                                                                                               
[ 82%] Linking CXX executable bin/xmr-stak                                                                                                                                                                                                                           
[ 82%] Built target xmr-stak                                                                                                                                                                                                                                         
[ 85%] Building CXX object CMakeFiles/xmrstak_opencl_backend.dir/xmrstak/backend/amd/jconf.cpp.o                                                                                                                                                                     
[ 88%] Building CXX object CMakeFiles/xmrstak_opencl_backend.dir/xmrstak/backend/amd/minethd.cpp.o                                                                                                                                                                   
[ 91%] Linking CXX shared library bin/libxmrstak_opencl_backend.so                                                                                                                                                                                                   
[ 91%] Built target xmrstak_opencl_backend                                                                                                                                                                                                                           
Scanning dependencies of target xmrstak_cuda_backend                                                                                                                                                                                                                 
[ 97%] Building CXX object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/minethd.cpp.o                                                                                                                                                                  
[ 97%] Building CXX object CMakeFiles/xmrstak_cuda_backend.dir/xmrstak/backend/nvidia/jconf.cpp.o                                                                                                                                                                    
[100%] Linking CXX shared library bin/libxmrstak_cuda_backend.so                                                                                                                                                                                                     
[100%] Built target xmrstak_cuda_backend                                                                                    
CMAKE_AR:FILEPATH=/usr/bin/ar                                                                                                                                                                                                                                        
CMAKE_BUILD_TYPE:STRING=Release                                                                                                                                                                                                                                      
CMAKE_COLOR_MAKEFILE:BOOL=ON                                                                                                                                                                                                                                         
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++                                                                                                                                                                                                                             
CMAKE_CXX_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar                                                                                                                                                                                                                       
CMAKE_CXX_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib                                                                                                                                                                                                               
CMAKE_CXX_FLAGS:STRING=                                                                                                                                                                                                                                              
CMAKE_CXX_FLAGS_DEBUG:STRING=-g                                                                                                                                                                                                                                      
CMAKE_CXX_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG                                                                                                                                                                                                                       
CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG                                                                                                                                                                                                                          
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG                                                                                                                                                                                                                
CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc                                                                                                                                                                                                                                
CMAKE_C_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar                                                                                                                                                                                                                         
CMAKE_C_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib                                                                                                                                                                                                                 
CMAKE_C_FLAGS:STRING=                                                                                                                                                                                                                                                
CMAKE_C_FLAGS_DEBUG:STRING=-g                                                                                                                                                                                                                                        
CMAKE_C_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG                                                                                                                                                                                                                         
CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG                                                                                                                                                                                                                            
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG                                                                                                                                                                                                                  
CMAKE_EXE_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                       
CMAKE_EXE_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                                 
CMAKE_EXE_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                            
CMAKE_EXE_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                               
CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                        
CMAKE_EXPORT_COMPILE_COMMANDS:BOOL=OFF                                                                                                                                                                                                                               
CMAKE_INSTALL_PREFIX:PATH=/opt/xmr-stak                                                                                                                                                                                                                              
CMAKE_LINKER:FILEPATH=/usr/bin/ld                                                                                                                                                                                                                                    
CMAKE_LINK_STATIC:BOOL=OFF                                                                                                                                                                                                                                           
CMAKE_MAKE_PROGRAM:FILEPATH=/usr/bin/gmake                                                                                                                                                                                                                           
CMAKE_MODULE_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                    
CMAKE_MODULE_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                              
CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                         
CMAKE_MODULE_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                            
CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                     
CMAKE_NM:FILEPATH=/usr/bin/nm                                                                                                                                                                                                                                        
CMAKE_OBJCOPY:FILEPATH=/usr/bin/objcopy                                                                                                                                                                                                                              
CMAKE_OBJDUMP:FILEPATH=/usr/bin/objdump                                                                                                                                                                                                                              
CMAKE_RANLIB:FILEPATH=/usr/bin/ranlib                                                                                                                                                                                                                                
CMAKE_SHARED_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                    
CMAKE_SHARED_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                              
CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                         
CMAKE_SHARED_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                            
CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                     
CMAKE_SKIP_INSTALL_RPATH:BOOL=NO                                                                                                                                                                                                                                     
CMAKE_SKIP_RPATH:BOOL=NO                                                                                                                                                                                                                                             
CMAKE_STATIC_LINKER_FLAGS:STRING=                                                                                                                                                                                                                                    
CMAKE_STATIC_LINKER_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                              
CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                         
CMAKE_STATIC_LINKER_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                            
CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                     
CMAKE_STRIP:FILEPATH=/usr/bin/strip                                                                                                                                                                                                                                  
CMAKE_VERBOSE_MAKEFILE:BOOL=FALSE                                                                                                                                                                                                                                    
CPU_ENABLE:BOOL=ON                                                                                                                                                                                                                                                   
CUDA_64_BIT_DEVICE_CODE:BOOL=ON                                                                                                                                                                                                                                      
CUDA_ARCH:STRING=30;35;37;50;52;60;61;62;70                                                                                                                                                                                                                          
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE:BOOL=ON                                                                                                                                                                                                                       
CUDA_BUILD_CUBIN:BOOL=OFF                                                                                                                                                                                                                                            
CUDA_BUILD_EMULATION:BOOL=OFF                                                                                                                                                                                                                                        
CUDA_COMPILER:STRING=nvcc                                                                                                                                                                                                                                            
CUDA_CUDART_LIBRARY:FILEPATH=/opt/cuda/lib64/libcudart.so                                                                                                                                                                                                            
CUDA_CUDA_LIBRARY:FILEPATH=/usr/lib/libcuda.so                                                                                                                                                                                                                       
CUDA_ENABLE:BOOL=ON                                                                                                                                                                                                                                                  
CUDA_GENERATED_OUTPUT_DIR:PATH=                                                                                                                                                                                                                                      
CUDA_HOST_COMPILATION_CPP:BOOL=ON                                                                                                                                                                                                                                    
CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc                                                                                                                                                                                                                              
CUDA_KEEP_FILES:BOOL=OFF                                                                                                                                                                                                                                             
CUDA_NVCC_EXECUTABLE:FILEPATH=/opt/cuda/bin/nvcc                                                                                                                                                                                                                     
CUDA_NVCC_FLAGS:STRING=                                                                                                                                                                                                                                              
CUDA_NVCC_FLAGS_DEBUG:STRING=                                                                                                                                                                                                                                        
CUDA_NVCC_FLAGS_MINSIZEREL:STRING=                                                                                                                                                                                                                                   
CUDA_NVCC_FLAGS_RELEASE:STRING=                                                                                                                                                                                                                                      
CUDA_NVCC_FLAGS_RELWITHDEBINFO:STRING=                                                                                                                                                                                                                               
CUDA_PROPAGATE_HOST_FLAGS:BOOL=ON                                                                                                                                                                                                                                    
CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND                                                                                                                                                                                                                    
CUDA_SEPARABLE_COMPILATION:BOOL=OFF                                                                                                                                                                                                                                  
CUDA_SHOW_CODELINES:BOOL=OFF                                                                                                                                                                                                                                         
CUDA_SHOW_REGISTER:BOOL=OFF                                                                                                                                                                                                                                          
CUDA_TOOLKIT_INCLUDE:PATH=/opt/cuda/include                                                                                                                                                                                                                          
CUDA_TOOLKIT_ROOT_DIR:PATH=/opt/cuda                                                                                                                                                                                                                                 
CUDA_USE_STATIC_CUDA_RUNTIME:BOOL=ON                                                                                                                                                                                                                                 
CUDA_VERBOSE_BUILD:BOOL=OFF                                                                                                                                                                                                                                          
CUDA_VERSION:STRING=9.1                                                                                                                                                                                                                                              
CUDA_cublas_LIBRARY:FILEPATH=/opt/cuda/lib64/libcublas.so                                                                                                                                                                                                            
CUDA_cublas_device_LIBRARY:FILEPATH=/opt/cuda/lib64/libcublas_device.a                                                                                                                                                                                               
CUDA_cudadevrt_LIBRARY:FILEPATH=/opt/cuda/lib64/libcudadevrt.a                                                                                                                                                                                                       
CUDA_cudart_static_LIBRARY:FILEPATH=/opt/cuda/lib64/libcudart_static.a                                                                                                                                                                                               
CUDA_cufft_LIBRARY:FILEPATH=/opt/cuda/lib64/libcufft.so                                                                                                                                                                                                              
CUDA_cupti_LIBRARY:FILEPATH=CUDA_cupti_LIBRARY-NOTFOUND                                                                                                                                                                                                              
CUDA_curand_LIBRARY:FILEPATH=/opt/cuda/lib64/libcurand.so                                                                                                                                                                                                            
CUDA_cusolver_LIBRARY:FILEPATH=/opt/cuda/lib64/libcusolver.so                                                                                                                                                                                                        
CUDA_cusparse_LIBRARY:FILEPATH=/opt/cuda/lib64/libcusparse.so                                                                                                                                                                                                        
CUDA_nppc_LIBRARY:FILEPATH=/opt/cuda/lib64/libnppc.so                                                                                                                                                                                                                
CUDA_nppi_LIBRARY:FILEPATH=CUDA_nppi_LIBRARY-NOTFOUND                                                                                                                                                                                                                
CUDA_npps_LIBRARY:FILEPATH=/opt/cuda/lib64/libnpps.so                                                                                                                                                                                                                
CUDA_rt_LIBRARY:FILEPATH=/usr/lib/librt.so                                                                                                                                                                                                                           
EXECUTABLE_OUTPUT_PATH:STRING=bin                                                                                                                                                                                                                                    
HWLOC:FILEPATH=/usr/lib/libhwloc.so                                                                                                                                                                                                                                  
HWLOC_ENABLE:BOOL=ON                                                                                                                                                                                                                                                 
HWLOC_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                  
LIBRARY_OUTPUT_PATH:STRING=bin                                                                                                                                                                                                                                       
MHTD:FILEPATH=/usr/lib/libmicrohttpd.so                                                                                                                                                                                                                              
MICROHTTPD_ENABLE:BOOL=ON                                                                                                                                                                                                                                            
MTHD_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                   
OPENSSL_CRYPTO_LIBRARY:FILEPATH=/usr/lib64/libcrypto.so                                                                                                                                                                                                              
OPENSSL_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                
OPENSSL_SSL_LIBRARY:FILEPATH=/usr/lib64/libssl.so                                                                                                                                                                                                                    
OpenCL_ENABLE:BOOL=ON                                                                                                                                                                                                                                                
OpenCL_INCLUDE_DIR:PATH=/usr/include                                                                                                                                                                                                                                 
OpenCL_LIBRARY:FILEPATH=/usr/lib/libOpenCL.so                                                                                                                                                                                                                        
OpenSSL_ENABLE:BOOL=ON                                                                                                                                                                                                                                               
PKG_CONFIG_EXECUTABLE:FILEPATH=/usr/bin/pkg-config                                                                                                                                                                                                                   
XMR-STAK_COMPILE:STRING=native                                                                                                                                                                                                                                       
XMR-STAK_LARGEGRID:BOOL=ON                                                                                                                                                                                                                                           
XMR-STAK_THREADS:STRING=0                                             

clinfo

Number of platforms                               2                                                                                                                                                                                                                  
  Platform Name                                   AMD Accelerated Parallel Processing                                                                                                                                                                                
  Platform Vendor                                 Advanced Micro Devices, Inc.                                                                                                                                                                                       
  Platform Version                                OpenCL 2.0 AMD-APP.internal.dbg (2528.0)                                                                                                                                                                           
  Platform Profile                                FULL_PROFILE                                                                                                                                                                                                       
  Platform Extensions                             cl_khr_icd cl_amd_object_metadata cl_amd_event_callback                                                                                                                                                            
  Platform Max metadata object keys (AMD)         8                                                                                                                                                                                                                  
  Platform Extensions function suffix             AMD                                                                                                                                                                                                                

  Platform Name                                   NVIDIA CUDA                                                                                                                                                                                                        
  Platform Vendor                                 NVIDIA Corporation                                                                                                                                                                                                 
  Platform Version                                OpenCL 1.2 CUDA 9.1.84                                                                                                                                                                                             
  Platform Profile                                FULL_PROFILE                                                                                                                                                                                                       
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer                                                                                                                                                             
  Platform Extensions function suffix             NV                                                                                                                                                                                                                 

  Platform Name                                   AMD Accelerated Parallel Processing                                                                                                                                                                                
Number of devices                                 1                                                                                                                                                                                                                  
  Device Name                                     gfx900                                                                                                                                                                                                             
  Device Vendor                                   Advanced Micro Devices, Inc.                                                                                                                                                                                       
  Device Vendor ID                                0x1002                                                                                                                                                                                                             
  Device Version                                  OpenCL 1.2                                                                                                                                                                                                         
  Driver Version                                  2528.0 (HSA1.1,LC)                                                                                                                                                                                                 
  Device OpenCL C Version                         OpenCL C 2.0                                                                                                                                                                                                       
  Device Type                                     GPU                                                                                                                                                                                                                
  Device Board Name (AMD)                         Vega 10 XT [Radeon RX Vega 64]                                                                                                                                                                                     
  Device Topology (AMD)                           PCI-E, 06:00.0                                                                                                                                                                                                     
  Device Profile                                  FULL_PROFILE                                                                                                                                                                                                       
  Device Available                                Yes                                                                                                                                                                                                                
  Compiler Available                              Yes                                                                                                                                                                                                                
  Linker Available                                Yes                                                                                                                                                                                                                
  Max compute units                               64                                                                                                                                                                                                                 
  SIMD per compute unit (AMD)                     4                                                                                                                                                                                                                  
  SIMD width (AMD)                                16                                                                                                                                                                                                                 
  SIMD instruction width (AMD)                    1                                                                                                                                                                                                                  
  Max clock frequency                             1630MHz                                                                                                                                                                                                            
  Graphics IP (AMD)                               9.0                                                                                                                                                                                                                
  Device Partition                                (core)                                                                                                                                                                                                             
    Max number of sub-devices                     64                                                                                                                                                                                                                 
    Supported partition types                     (n/a)                                                                                                                                                                                                              
    Supported affinity domains                    (n/a)                                                                                                                                                                                                              
  Max work item dimensions                        3                                                                                                                                                                                                                  
  Max work item sizes                             1024x1024x1024                                                                                                                                                                                                     
  Max work group size                             256                                                                                                                                                                                                                
  Preferred work group size (AMD)                 256                                                                                                                                                                                                                
  Max work group size (AMD)                       1024                                                                                                                                                                                                               
  Preferred work group size multiple              64                                                                                                                                                                                                                 
  Wavefront width (AMD)                           64                                                                                                                                                                                                                 
  Preferred / native vector sizes                                                                                                                                                                                                                                    
    char                                                 4 / 4                                                                                                                                                                                                       
    short                                                2 / 2                                                                                                                                                                                                       
    int                                                  1 / 1                                                                                                                                                                                                       
    long                                                 1 / 1                                                                                                                                                                                                       
    half                                                 1 / 1        (cl_khr_fp16)                                                                                                                                                                                  
    float                                                1 / 1                                                                                                                                                                                                       
    double                                               1 / 1        (cl_khr_fp64)                                                                                                                                                                                  
  Half-precision Floating-point support           (cl_khr_fp16)                                                                                                                                                                                                      
    Denormals                                     No                                                                                                                                                                                                                 
    Infinity and NANs                             No                                                                                                                                                                                                                 
    Round to nearest                              No                                                                                                                                                                                                                 
    Round to zero                                 No                                                                                                                                                                                                                 
    Round to infinity                             No                                                                                                                                                                                                                 
    IEEE754-2008 fused multiply-add               No                                                                                                                                                                                                                 
    Support is emulated in software               No                                                                                                                                                                                                                 
  Single-precision Floating-point support         (core)                                                                                                                                                                                                             
    Denormals                                     Yes                                                                                                                                                                                                                
    Infinity and NANs                             Yes                                                                                                                                                                                                                
    Round to nearest                              Yes                                                                                                                                                                                                                
    Round to zero                                 Yes                                                                                                                                                                                                                
    Round to infinity                             Yes                                                                                                                                                                                                                
    IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
    Support is emulated in software               No                                                                                                                                                                                                                 
    Correctly-rounded divide and sqrt operations  Yes                                                                                                                                                                                                                
  Double-precision Floating-point support         (cl_khr_fp64)                                                                                                                                                                                                      
    Denormals                                     Yes                                                                                                                                                                                                                
    Infinity and NANs                             Yes                                                                                                                                                                                                                
    Round to nearest                              Yes                                                                                                                                                                                                                
    Round to zero                                 Yes                                                                                                                                                                                                                
    Round to infinity                             Yes                                                                                                                                                                                                                
    IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
    Support is emulated in software               No                                                                                                                                                                                                                 
  Address bits                                    64, Little-Endian                                                                                                                                                                                                  
  Global memory size                              8573157376 (7.984GiB)                                                                                                                                                                                              
  Global free memory (AMD)                        8370176 (7.982GiB)                                                                                                                                                                                                 
  Global memory channels (AMD)                    64                                                                                                                                                                                                                 
  Global memory banks per channel (AMD)           4                                                                                                                                                                                                                  
  Global memory bank width (AMD)                  256 bytes                                                                                                                                                                                                          
  Error Correction support                        No                                                                                                                                                                                                                 
  Max memory allocation                           7287183769 (6.787GiB)                                                                                                                                                                                              
  Unified memory for Host and Device              No                                                                                                                                                                                                                 
  Minimum alignment for any data type             128 bytes                                                                                                                                                                                                          
  Alignment of base address                       1024 bits (128 bytes)                                                                                                                                                                                              
  Global Memory cache type                        Read/Write                                                                                                                                                                                                         
  Global Memory cache size                        16384 (16KiB)                                                                                                                                                                                                      
  Global Memory cache line size                   64 bytes                                                                                                                                                                                                           
  Image support                                   No                                                                                                                                                                                                                 
  Local memory type                               Local                                                                                                                                                                                                              
  Local memory size                               65536 (64KiB)                                                                                                                                                                                                      
  Local memory syze per CU (AMD)                  65536 (64KiB)                                                                                                                                                                                                      
  Local memory banks (AMD)                        32                                                                                                                                                                                                                 
  Max number of constant args                     8                                                                                                                                                                                                                  
  Max constant buffer size                        7287183769 (6.787GiB)                                                                                                                                                                                              
  Preferred constant buffer size (AMD)            16384 (16KiB)                                                                                                                                                                                                      
  Max size of kernel argument                     1024                                                                                                                                                                                                               
  Queue properties                                                                                                                                                                                                                                                   
    Out-of-order execution                        No                                                                                                                                                                                                                 
    Profiling                                     Yes                                                                                                                                                                                                                
  Prefer user sync for interop                    Yes                                                                                                                                                                                                                
  Number of P2P devices (AMD)                     0                                                                                                                                                                                                                  
  P2P devices (AMD)                               (n/a)                                                                                                                                                                                                              
  Profiling timer resolution                      1ns                                                                                                                                                                                                                
  Profiling timer offset since Epoch (AMD)        0ns (Thu Jan  1 00:00:00 1970)                                                                                                                                                                                     
  Execution capabilities                                                                                                                                                                                                                                             
    Run OpenCL kernels                            Yes                                                                                                                                                                                                                
    Run native kernels                            No                                                                                                                                                                                                                 
    Thread trace supported (AMD)                  No                                                                                                                                                                                                                 
    Number of async queues (AMD)                  8                                                                                                                                                                                                                  
    Max real-time compute queues (AMD)            8                                                                                                                                                                                                                  
    Max real-time compute units (AMD)             64                                                                                                                                                                                                                 
  printf() buffer size                            4194304 (4MiB)                                                                                                                                                                                                     
  Built-in kernels                                (n/a)                                                                                                                                                                                                              
  Device Extensions                               cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_liquid_flash cl_amd_copy_buffer_p2p                                        

  Platform Name                                   NVIDIA CUDA                                                                                                                                                                                                        
Number of devices                                 1                                                                                                                                                                                                                  
  Device Name                                     GeForce GTX 1070 Ti                                                                                                                                                                                                
  Device Vendor                                   NVIDIA Corporation                                                                                                                                                                                                 
  Device Vendor ID                                0x10de                                                                                                                                                                                                             
  Device Version                                  OpenCL 1.2 CUDA                                                                                                                                                                                                    
  Driver Version                                  390.42                                                                                                                                                                                                             
  Device OpenCL C Version                         OpenCL C 1.2                                                                                                                                                                                                       
  Device Type                                     GPU                                                                                                                                                                                                                
  Device Topology (NV)                            PCI-E, 01:00.0                                                                                                                                                                                                     
  Device Profile                                  FULL_PROFILE                                                                                                                                                                                                       
  Device Available                                Yes                                                                                                                                                                                                                
  Compiler Available                              Yes                                                                                                                                                                                                                
  Linker Available                                Yes                                                                                                                                                                                                                
  Max compute units                               19                                                                                                                                                                                                                 
  Max clock frequency                             1683MHz                                                                                                                                                                                                            
  Compute Capability (NV)                         6.1                                                                                                                                                                                                                
  Device Partition                                (core)                                                                                                                                                                                                             
    Max number of sub-devices                     1                                                                                                                                                                                                                  
    Supported partition types                     None                                                                                                                                                                                                               
    Supported affinity domains                    (n/a)                                                                                                                                                                                                              
  Max work item dimensions                        3                                                                                                                                                                                                                  
  Max work item sizes                             1024x1024x64                                                                                                                                                                                                       
  Max work group size                             1024                                                                                                                                                                                                               
  Preferred work group size multiple              32                                                                                                                                                                                                                 
  Warp size (NV)                                  32                                                                                                                                                                                                                 
  Preferred / native vector sizes                                                                                                                                                                                                                                    
    char                                                 1 / 1                                                                                                                                                                                                       
    short                                                1 / 1                                                                                                                                                                                                       
    int                                                  1 / 1                                                                                                                                                                                                       
    long                                                 1 / 1                                                                                                                                                                                                       
    half                                                 0 / 0        (n/a)                                                                                                                                                                                          
    float                                                1 / 1                                                                                                                                                                                                       
    double                                               1 / 1        (cl_khr_fp64)                                                                                                                                                                                  
  Half-precision Floating-point support           (n/a)                                                                                                                                                                                                              
  Single-precision Floating-point support         (core)                                                                                                                                                                                                             
    Denormals                                     Yes                                                                                                                                                                                                                
    Infinity and NANs                             Yes                                                                                                                                                                                                                
    Round to nearest                              Yes                                                                                                                                                                                                                
    Round to zero                                 Yes                                                                                                                                                                                                                
    Round to infinity                             Yes                                                                                                                                                                                                                
    IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
    Support is emulated in software               No                                                                                                                                                                                                                 
    Correctly-rounded divide and sqrt operations  Yes                                                                                                                                                                                                                
  Double-precision Floating-point support         (cl_khr_fp64)                                                                                                                                                                                                      
    Denormals                                     Yes                                                                                                                                                                                                                
    Infinity and NANs                             Yes                                                                                                                                                                                                                
    Round to nearest                              Yes                                                                                                                                                                                                                
    Round to zero                                 Yes                                                                                                                                                                                                                
    Round to infinity                             Yes                                                                                                                                                                                                                
    IEEE754-2008 fused multiply-add               Yes                                                                                                                                                                                                                
    Support is emulated in software               No                                                                                                                                                                                                                 
  Address bits                                    64, Little-Endian                                                                                                                                                                                                  
  Global memory size                              8513978368 (7.929GiB)                                                                                                                                                                                              
  Error Correction support                        No                                                                                                                                                                                                                 
  Max memory allocation                           2128494592 (1.982GiB)                                                                                                                                                                                              
  Unified memory for Host and Device              No                                                                                                                                                                                                                 
  Integrated memory (NV)                          No                                                                                                                                                                                                                 
  Minimum alignment for any data type             128 bytes                                                                                                                                                                                                          
  Alignment of base address                       4096 bits (512 bytes)                                                                                                                                                                                              
  Global Memory cache type                        Read/Write                                                                                                                                                                                                         
  Global Memory cache size                        311296 (304KiB)                                                                                                                                                                                                    
  Global Memory cache line size                   128 bytes                                                                                                                                                                                                          
  Image support                                   Yes                                                                                                                                                                                                                
    Max number of samplers per kernel             32                                                                                                                                                                                                                 
    Max size for 1D images from buffer            134217728 pixels                                                                                                                                                                                                   
    Max 1D or 2D image array size                 2048 images                                                                                                                                                                                                        
    Max 2D image size                             16384x32768 pixels                                                                                                                                                                                                 
    Max 3D image size                             16384x16384x16384 pixels                                                                                                                                                                                           
    Max number of read image args                 256                                                                                                                                                                                                                
    Max number of write image args                16                                                                                                                                                                                                                 
  Local memory type                               Local                                                                                                                                                                                                              
  Local memory size                               49152 (48KiB)                                                                                                                                                                                                      
  Registers per block (NV)                        65536                                                                                                                                                                                                              
  Max number of constant args                     9                                                                                                                                                                                                                  
  Max constant buffer size                        65536 (64KiB)                                                                                                                                                                                                      
  Max size of kernel argument                     4352 (4.25KiB)                                                                                                                                                                                                     
  Queue properties                                                                                                                                                                                                                                                   
    Out-of-order execution                        Yes                                                                                                                                                                                                                
    Profiling                                     Yes                                                                                                                                                                                                                
  Prefer user sync for interop                    No                                                                                                                                                                                                                 
  Profiling timer resolution                      1000ns                                                                                                                                                                                                             
  Execution capabilities                                                                                                                                                                                                                                             
    Run OpenCL kernels                            Yes                                                                                                                                                                                                                
    Run native kernels                            No                                                                                                                                                                                                                 
    Kernel execution timeout (NV)                 No                                                                                                                                                                                                                 
  Concurrent copy and kernel execution (NV)       Yes                                                                                                                                                                                                                
    Number of async copy engines                  2                                                                                                                                                                                                                  
  printf() buffer size                            1048576 (1024KiB)                                                                                                                                                                                                  
  Built-in kernels                                (n/a)                                                                                                                                                                                                              
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer                                                                                                                                                             

NULL platform behavior                                                                                                                                                                                                                                               
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform                                                                                                                                                                                                        
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform                                                                                                                                                                                                        
  clCreateContext(NULL, ...) [default]            No platform                                                                                                                                                                                                        
  clCreateContext(NULL, ...) [other]              Success [AMD]                                                                                                                                                                                                      
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)                                                                                                                                                                                                 
    Platform Name                                 AMD Accelerated Parallel Processing                                                                                                                                                                                
    Device Name                                   gfx900                                                                                                                                                                                                             
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform                                                                                                                                                                                    
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)                                                                                                                                                                                                     
    Platform Name                                 AMD Accelerated Parallel Processing                                                                                                                                                                                
    Device Name                                   gfx900                                                                                                                                                                                                             
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform                                                                                                                                                                            
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform                                                                                                                                                                                 
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)                                                                                                                                                                                                     
    Platform Name                                 AMD Accelerated Parallel Processing                                                                                                                                                                                
    Device Name                                   gfx900                               

Modules

Module                  Size  Used by
nvidia_uvm            675840  4
ext2                   57344  1
dm_mod                 98304  0
dax                    20480  1 dm_mod
ipmi_ssif              24576  0
x86_pkg_temp_thermal    16384  0
kvm_intel             188416  0
kvm                   491520  1 kvm_intel
irqbypass              16384  1 kvm
aesni_intel           184320  0
cp210x                 20480  0
usbserial              28672  1 cp210x
amdkfd                126976  3
nvidia_drm             32768  1
aes_x86_64             20480  1 aesni_intel
nvidia_modeset       1069056  3 nvidia_drm
amdgpu               2154496  2
nvidia              13799424  348 nvidia_modeset,nvidia_uvm
ttm                    81920  1 amdgpu
backlight              16384  1 amdgpu
coretemp               16384  0
crypto_simd            16384  1 aesni_intel
igb                   155648  0
cryptd                 20480  2 crypto_simd,aesni_intel
glue_helper            16384  1 aesni_intel
ipmi_si                53248  0
ipmi_devintf           20480  0
ipmi_msghandler        36864  4 nvidia,ipmi_ssif,ipmi_devintf,ipmi_si
gstoner commented 6 years ago

We will take a look

greg

ghost commented 6 years ago

@chron0 not sure if this will help but this is what I did to get it compiling with the ROCm drivers.

  1. Don't install the AMD SDK.
  2. Install the ROCm drivers following the steps on the GitHub Repo.
  3. Follow xmr-stak compile steps found in Linux docs but pass the include and library paths to cmake to the ROCm install location. See below:
# Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-39-generic x86_64)
sudo apt install libmicrohttpd-dev libssl-dev cmake build-essential libhwloc-dev
git clone https://github.com/fireice-uk/xmr-stak.git
mkdir xmr-stak/build
cd xmr-stak/build
cmake .. -DCUDA_ENABLE=OFF -DOpenCL_INCLUDE_DIR=/opt/rocm/opencl/include/ -DOpenCL_LIBRARY=/opt/rocm/opencl/lib/x86_64/libOpenCL.so
make install

I used a clean install of Ubuntu 16.04 with following hardware:

xmr-stak runs ok (about 1200 H/s) Only problem is I can't get more than one GPU to be detected by xmr-stak or /opt/rocm/opencl/bin/x86_64/clinfo.

gstoner commented 6 years ago

Please do not install the Historical OpenCL SDK with ROCm it does not need this to build OpenCL applications. We removed this restriction in ROCm when also now when you install the base driver rocm-opencl-dev is installed as well so you no longer need to do this step like you did in the past.

Spudz76 commented 6 years ago

Apparently there is this new 17.50 series which claims to have fixed some OpenCL + Vega issue(s)...

https://support.amd.com/en-us/download/workstation?os=Linux%20x86_64#pro-driver

chron0 commented 6 years ago

That may do the trick, I do have AMDSDK 3.0 installed. I'll uninstall and try a new build tomorrow.

chron0 commented 6 years ago

@Spudz76: I've tried 17.50 before as well, still no working opencl interface but it's a pain to work with the amd "pro" stuff on gentoo.

chron0 commented 6 years ago

Full strace: http://termbin.com/t4g3