ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
684 stars 94 forks source link

hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") #1106

Open reinka opened 4 years ago

reinka commented 4 years ago

GPU: 5700xt

When using the following Docker image:

rocm/tensorflow     latest              d83f8c9d5c96        2 weeks ago         10.3GB

with ROCm installed on the Docker host as explained here: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html

I get the following error when executing TensorFlow ops:

root@apoehlmann:/root# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-06 20:14:03.889728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

and the Python console dies. I started the container with the alias mentioned in the corresponding Docker registry: https://hub.docker.com/r/rocm/tensorflow

I get the same error when I try to run tensorflow ops on the host.

Googling this issue yields only a handful of results so I feel like I might have some misconfiguration but I cannot figure out what it is.

xuhuisheng commented 4 years ago

I test rocm-3.7.0 on ubuntu-20.04, my gpu is gfx803. Tensorflow-rocm loaded /opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco and /opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co. 5700xt related gfx1010, so maybe there are missing some library for it.

reinka commented 4 years ago

Hmm, I'm afraid I don't understand enough to know how to use your information :/

oleid commented 4 years ago

Same problem, different GPU and not in docker, but ArchLinux.

Python 3.8.5 (default, Sep  5 2020, 10:50:12) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-08 15:28:57.302760: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-08 15:28:57.345180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: -1B/s
2020-09-08 15:28:57.417068: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-08 15:28:57.418638: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-08 15:28:57.425913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
/home/oleid/.cache/rua/build/hip-rocclr/src/HIP-rocm-3.7.0/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

@xuhuisheng: How did you get the list of files tensorflow-rocm loaded? I tried strace-ing my python script -- to no avail.

It would seem I don't have /opt/rocm/rocblas/lib/library/, possible that's the problem.

$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/librocblas.so.0.1
oleid commented 4 years ago

GPU: 5700xt

When using the following Docker image:

[..]

@reinka:

I find it strange that your python output doesn't list a device. Does rocminfo or clinfo list anything?

By the way, when I experimented with tensorflow in docker, I used something like:

sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --volume $PWD:/data rocm/tensorflow

xuhuisheng commented 4 years ago

I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory. The rocBLAS didnot support gfx1010 tensile image,

The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.

dpkg -c rocblas_2.26.0.2565-9d981389_amd64.deb

drwxr-xr-x root/root         0 2020-08-18 09:08 ./opt/rocm-3.7.0/rocblas/lib/library/
-rw-r--r-- root/root  15337680 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
-rw-r--r-- root/root  14182000 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
-rw-r--r-- root/root  14905424 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
-rw-r--r-- root/root  14989608 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
-rw-r--r-- root/root  13846184 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
-rw-r--r-- root/root  14116520 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
-rw-r--r-- root/root 108018750 2020-08-18 09:00 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary.yaml
-rw-r--r-- root/root   3678448 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx803.co
-rw-r--r-- root/root  35668608 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx900.co
-rw-r--r-- root/root  97234680 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx906.co
-rw-r--r-- root/root 110233032 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx908.co
oleid commented 4 years ago

Okay, I now have those files as well. That pull https://github.com/rocm-arch/rocm-arch/pull/413 fixed it.

find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/librocblas.so.0.1

Problem still persists, though.

oleid commented 4 years ago

I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory. The rocBLAS didnot support gfx1010 tensile image,

The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.

Please note that in the aforementioned docker container tensorflow-rocm seems to find all it needs. So this must be something ArchLinux related in my case.

root@0f19f0974f40:/data# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 12:05:54.542100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-09 12:05:54.582874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.585567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.586959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.595182: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.595500: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.595671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.605093: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3851195000 Hz
2020-09-09 12:05:54.605820: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56782fce80 initialized for platform Host (this does not guarantee that XLA 
2020-09-09 12:05:54.605855: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-09 12:05:54.608314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56781688f0 initialized for platform ROCM (this does not guarantee that XLA 
2020-09-09 12:05:54.608348: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Ellesmere [Radeon RX 470/480/570/570X/580/580X], AMDGPU ISA ve
2020-09-09 12:05:54.916198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X]     ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.916264: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.916280: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.916294: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.916308: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.916412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.916438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-09 12:05:54.916448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-09 12:05:54.916455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-09 12:05:54.916606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3796 MB
 0000:08:00.0)
<tf.Tensor: shape=(), dtype=int32, numpy=3>
oleid commented 4 years ago

It would seem librocrand is to blame on Arch. It is missing support for my GPU. I hacked in debug info as well and a dump of the call stack:

2020-09-09 14:37:30.875746: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
isCompatibleCodeObject: gfx803 == gfx900?
isCompatibleCodeObject: gfx803 == gfx906?
isCompatibleCodeObject: gfx803 == gfx908?
Call stack:
/opt/rocm/hip/lib/libamdhip64.so.3(+0x7eaf8)[0x7f8237487af8]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x8032e)[0x7f823748932e]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x805a4)[0x7f82374895a4]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x80929)[0x7f8237489929]
/opt/rocm/rocrand/lib/librocrand.so(+0xdcbd)[0x7f82001a6cbd]

Will report back once I know more.

oleid commented 4 years ago

Yes, that did the trick. Works for me now, thanks :)

tpkessler commented 4 years ago

Hey @oleid to which trick are you referring to? I've submitted a PR to rocm-arch which adds gfx803 as a target architecture, see https://github.com/rocm-arch/rocm-arch/pull/414

reinka commented 4 years ago

@oleid Hm, I think you are onto something. I used both the official docker run command and your version and inside the container I get the following rocminfo output:

root@5419cfc6178e:/root# rocminfo 
sh: 1: lsmod: not found
ROCk module is NOT loaded, possibly no GPU devices
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Device 731f                        
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2080                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

whereas on my host (Ubunut 20.04) it seem to work properly:

$ rocminfo 
ROCk module is loaded
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16403260(0xfa4b3c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29471(0x731f)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2080                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        80(0x50)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done *** 

However, on my host I still get the same issue when I try to run tensorflow operations:

apoehlmann@apoehlmann:~$ . .envs/mypy3/bin/activate
(mypy3) apoehlmann@apoehlmann:~$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 18:55:30.801592: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)

TF version:

(mypy3) apoehlmann@apoehlmann:~$ pip freeze | grep tensor
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.3.0
tensorflow-rocm==2.3.0

EDIT

I also ran the following on host & inside container, got the same output:

(mypy3) apoehlmann@apoehlmann:~$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/lib/librocblas.so.0.1.30700
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
xuhuisheng commented 4 years ago

sudo apt install kmod can solve the lsmod warning in docker.

And I cannot find how to generate the Tensile image for gfx1010 under rocBLAS. Maybe you could recompile rocBLAS with BUILD_TENSILE_HOST=false. It will skip the Tensile image.

Actually the rocm didnot support gfx1010(nav10) offcially, so I cannot guarentee we could run gfx1010 on ROCm, eventually, please refer these issues:

https://github.com/ROCmSoftwarePlatform/pytorch/issues/718 https://github.com/RadeonOpenCompute/ROCm/issues/887

reinka commented 4 years ago

@xuhuisheng I solved the lsmod problem however the issue still remained.

Thanks for the hint and links. I will look into it. Before I started to get TF running with the 5700xt I found some other github issue where they linked to this blog post

https://www.preining.info/blog/2020/05/switching-from-nvidia-to-amd-including-tensorflow/

and confirmed it would work. So it seems some people get it running with the 5700xt. I already tried to reproduce the steps there but I wasn't successful.

Also tried this approach here https://github.com/RadeonOpenCompute/ROCm/issues/887#issuecomment-669717748 and wasn't able to reproduce it either.

xuhuisheng commented 4 years ago

@reinka I am afraid we had read this blog already, unfortrunately, the auther claimed that he met a segment fault later in the comment.

o8ruza8o commented 4 years ago

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

oleid commented 4 years ago

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released).

o8ruza8o commented 4 years ago

I would appreciate a flag that allows me to use what works even if not everything and not tested instead of not being able to do anything at all on new GPUs.

On Fri, Oct 2, 2020 at 11:56 PM oleid notifications@github.com wrote:

Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?

It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/1106#issuecomment-703057842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPIAOOOFXVRPFMCZPABIDSI3DLRANCNFSM4Q434RNQ .

xuhuisheng commented 4 years ago

@o8ruza8o which version of rocm do you use?By rigtorps reseaching, need rocm-3.7 to support gfx10xx.

gfx1012 is more complex, tensile only support gfx1010 and gfx1011, you may have to copy related Kernel.koso too.

And I had two ideas for it. first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE please refer this issue https://github.com/ROCmSoftwarePlatform/pytorch/issues/718#issuecomment-701174549

o8ruza8o commented 4 years ago

I am running rocm 3.8.0. My kernel is 5.7.19. My GPU is gfx1012.

On Sat, Oct 3, 2020 at 4:21 PM Xu Huisheng notifications@github.com wrote:

@o8ruza8o https://github.com/o8ruza8o which version of rocm do you use? since rigtorp reseaching, need rocm-3.7 to support gfx10xx.

And Ihad two ideas for it. first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE please refer this issue

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/1106#issuecomment-703176450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPIAJ4FNPWX5CA5NZ3CQ3SI6WW5ANCNFSM4Q434RNQ .

km1993 commented 4 years ago

I have 5700xt I tried every possible method mentioned to get over this issue, nothing helped. _```

import tensorflow as tf tf.add(1,2) 2020-10-09 00:05:00.599858: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so /src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") Aborted (core dumped)

xuhuisheng commented 4 years ago

There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November. https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10

da-phil commented 3 years ago

There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November. https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10

I'm curious whether the gfx10 branch also covers chipsets other than gfx1030, because it seems that only gfx1030 has been added, see: https://github.com/ROCmSoftwarePlatform/rocBLAS/commit/8cd7bf043c6d97dbd485b163393e2c52bf3dfd5d

And also in other rocm packages, e.g.: https://github.com/ROCmSoftwarePlatform/rccl/commit/9f20b00548469f751eab6efc04686c51d6ebd47d

xuhuisheng commented 3 years ago

@da-phil So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0. Only hope the patch for RDNA2 can use to RDNA1 without big modifications.

da-phil commented 3 years ago

@da-phil So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0. Only hope the patch for RDNA2 can use to RDNA1 without big modifications.

I wonder why the new RDNA2 is even categorized within gfx10, there must be some similarities in the way they work :thinking:

Off-topic question: do you or anybody else know any other recent AMD radeon GPU other than gfx803, gfx900, gfx906 and gfx908 which proved to work well with rocm and therefore tensorflow & pytorch? If that's the case I'd replace my new RX 5700XT by another AMD GPU right away. I like AMDs new open-source policy and don't want to go back to nvidia...

iamsanjaymalakar commented 3 years ago

import tensorflow as tf x = tf.variable(2) Traceback (most recent call last): File "", line 1, in AttributeError: module 'tensorflow' has no attribute 'variable' x = tf.Variable(2) 2020-11-20 13:14:26.164093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so /src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")

I am also having the same problem. Ubuntu 20.04 RX590 rocm3.9

Has anyone find any solution?

xuhuisheng commented 3 years ago

@iamsanjaymalakar please see this issue https://github.com/RadeonOpenCompute/ROCm/issues/1269

iamsanjaymalakar commented 3 years ago

@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269

I am not sure I understood the solution correctly. I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists. I think i may be missing something.

xuhuisheng commented 3 years ago

@iamsanjaymalakar I wrote a doc for gfx803 issues. https://github.com/xuhuisheng/rocm-build/blob/develop/docs/gfx803.md

Doev commented 3 years ago

I am currently at the same point.

Ubuntu 18.04 RX 5500 XT

No idea, how to use the workaround.

xuhuisheng commented 3 years ago

@Doev RX 55000 XT didnot supported offcially. https://github.com/RadeonOpenCompute/ROCm/issues/1306

krishoza commented 3 years ago

@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269

I am not sure I understood the solution correctly. I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists. I think i may be missing something.

I am getting the similar error. I have checked the AMDGPU_TARGETS for same library i.e. rocSPARSE and it correctly mentions the GPU I have which is gfx906.

jerryyin commented 3 years ago

navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.

RobertKillick commented 3 years ago

navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.

Is there any idea how long it will take for support to come?

jerryyin commented 3 years ago

@RobertKillick That would be a question to ROCm guys. Once they have the infrastructure ready, it is trivial to add TF support for it.

peterdfields commented 3 years ago

Has anyone had any luck getting tensorflow-rocm running on a gfx1030 device?

UPDATE: I was able to get things running on a gfx1030 device building tf from source, I couldn't get available binaries to run.