Open reinka opened 4 years ago
I test rocm-3.7.0 on ubuntu-20.04, my gpu is gfx803. Tensorflow-rocm loaded /opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco and /opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co. 5700xt related gfx1010, so maybe there are missing some library for it.
Hmm, I'm afraid I don't understand enough to know how to use your information :/
Same problem, different GPU and not in docker, but ArchLinux.
Python 3.8.5 (default, Sep 5 2020, 10:50:12)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-08 15:28:57.302760: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-08 15:28:57.345180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: -1B/s
2020-09-08 15:28:57.417068: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-08 15:28:57.418638: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-08 15:28:57.425913: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
/home/oleid/.cache/rua/build/hip-rocclr/src/HIP-rocm-3.7.0/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
@xuhuisheng: How did you get the list of files tensorflow-rocm loaded? I tried strace
-ing my python script -- to no avail.
It would seem I don't have /opt/rocm/rocblas/lib/library/
, possible that's the problem.
$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/librocblas.so.0.1
GPU: 5700xt
When using the following Docker image:
[..]
@reinka:
I find it strange that your python output doesn't list a device. Does rocminfo
or clinfo
list anything?
By the way, when I experimented with tensorflow in docker, I used something like:
sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --volume $PWD:/data rocm/tensorflow
I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory. The rocBLAS didnot support gfx1010 tensile image,
The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.
dpkg -c rocblas_2.26.0.2565-9d981389_amd64.deb
drwxr-xr-x root/root 0 2020-08-18 09:08 ./opt/rocm-3.7.0/rocblas/lib/library/
-rw-r--r-- root/root 15337680 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
-rw-r--r-- root/root 14182000 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
-rw-r--r-- root/root 14905424 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
-rw-r--r-- root/root 14989608 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
-rw-r--r-- root/root 13846184 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
-rw-r--r-- root/root 14116520 2020-08-18 08:53 ./opt/rocm-3.7.0/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
-rw-r--r-- root/root 108018750 2020-08-18 09:00 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary.yaml
-rw-r--r-- root/root 3678448 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx803.co
-rw-r--r-- root/root 35668608 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx900.co
-rw-r--r-- root/root 97234680 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx906.co
-rw-r--r-- root/root 110233032 2020-08-18 08:54 ./opt/rocm-3.7.0/rocblas/lib/library/TensileLibrary_gfx908.co
Okay, I now have those files as well. That pull https://github.com/rocm-arch/rocm-arch/pull/413 fixed it.
find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/librocblas.so.0.1
Problem still persists, though.
I compiled HIP from source rocm-3.7.0 and add some logs for debug. You can find the hip_code_object.cpp from HIP/rocclr/ directory. The rocBLAS didnot support gfx1010 tensile image,
The code_object function should be a new feature from rocm-3.7.0, I am investigating a bug for gfx803 on rocm-3.7.0, rocblas seems to be the key, So I am reading the code around.
Please note that in the aforementioned docker container tensorflow-rocm seems to find all it needs. So this must be something ArchLinux related in my case.
root@0f19f0974f40:/data# python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 12:05:54.542100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
2020-09-09 12:05:54.582874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X] ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.585567: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.586959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.595182: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.595500: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.595671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.605093: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3851195000 Hz
2020-09-09 12:05:54.605820: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56782fce80 initialized for platform Host (this does not guarantee that XLA
2020-09-09 12:05:54.605855: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-09 12:05:54.608314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f56781688f0 initialized for platform ROCM (this does not guarantee that XLA
2020-09-09 12:05:54.608348: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Ellesmere [Radeon RX 470/480/570/570X/580/580X], AMDGPU ISA ve
2020-09-09 12:05:54.916198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: Ellesmere [Radeon RX 470/480/570/570X/580/580X] ROCm AMD GPU ISA: gfx803
coreClock: 1.26GHz coreCount: 32 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 0B/s
2020-09-09 12:05:54.916264: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
2020-09-09 12:05:54.916280: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libMIOpen.so
2020-09-09 12:05:54.916294: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
2020-09-09 12:05:54.916308: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocrand.so
2020-09-09 12:05:54.916412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-09 12:05:54.916438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-09 12:05:54.916448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-09-09 12:05:54.916455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-09-09 12:05:54.916606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3796 MB
0000:08:00.0)
<tf.Tensor: shape=(), dtype=int32, numpy=3>
It would seem librocrand is to blame on Arch. It is missing support for my GPU. I hacked in debug info as well and a dump of the call stack:
2020-09-09 14:37:30.875746: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocfft.so
isCompatibleCodeObject: gfx803 == gfx900?
isCompatibleCodeObject: gfx803 == gfx906?
isCompatibleCodeObject: gfx803 == gfx908?
Call stack:
/opt/rocm/hip/lib/libamdhip64.so.3(+0x7eaf8)[0x7f8237487af8]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x8032e)[0x7f823748932e]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x805a4)[0x7f82374895a4]
/opt/rocm/hip/lib/libamdhip64.so.3(+0x80929)[0x7f8237489929]
/opt/rocm/rocrand/lib/librocrand.so(+0xdcbd)[0x7f82001a6cbd]
Will report back once I know more.
Yes, that did the trick. Works for me now, thanks :)
Hey @oleid to which trick are you referring to? I've submitted a PR to rocm-arch which adds gfx803
as a target architecture, see https://github.com/rocm-arch/rocm-arch/pull/414
@oleid Hm, I think you are onto something. I used both the official docker run command and your version and inside the container I get the following rocminfo
output:
root@5419cfc6178e:/root# rocminfo
sh: 1: lsmod: not found
ROCk module is NOT loaded, possibly no GPU devices
Able to open /dev/kfd read-write
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 3700X 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 3700X 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3600
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16403260(0xfa4b3c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16403260(0xfa4b3c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
N/A
*******
Agent 2
*******
Name: gfx1010
Uuid: GPU-XX
Marketing Name: Device 731f
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 29471(0x731f)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2080
BDFID: 10240
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 80(0x50)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1010
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
whereas on my host (Ubunut 20.04) it seem to work properly:
$ rocminfo
ROCk module is loaded
Able to open /dev/kfd read-write
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 3700X 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 3700X 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3600
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16403260(0xfa4b3c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16403260(0xfa4b3c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
N/A
*******
Agent 2
*******
Name: gfx1010
Uuid: GPU-XX
Marketing Name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 29471(0x731f)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2080
BDFID: 10240
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 80(0x50)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1010
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
However, on my host I still get the same issue when I try to run tensorflow operations:
apoehlmann@apoehlmann:~$ . .envs/mypy3/bin/activate
(mypy3) apoehlmann@apoehlmann:~$ python3
Python 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.add(1,2)
2020-09-09 18:55:30.801592: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so
/src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
Aborted (core dumped)
TF version:
(mypy3) apoehlmann@apoehlmann:~$ pip freeze | grep tensor
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.3.0
tensorflow-rocm==2.3.0
EDIT
I also ran the following on host & inside container, got the same output:
(mypy3) apoehlmann@apoehlmann:~$ find /opt/rocm/rocblas/ -type f
/opt/rocm/rocblas/lib/librocblas.so.0.1.30700
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx906.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx803.co
/opt/rocm/rocblas/lib/library/TensileLibrary.yaml
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx908.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1011.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx900.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/rocblas/lib/library/TensileLibrary_gfx906.co
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx803.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/rocblas/lib/library/Kernels.so-000-gfx908.hsaco
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-targets-release.cmake
/opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config-version.cmake
/opt/rocm/rocblas/include/rocblas-functions.h
/opt/rocm/rocblas/include/rocblas-auxiliary.h
/opt/rocm/rocblas/include/rocblas-version.h
/opt/rocm/rocblas/include/rocblas-types.h
/opt/rocm/rocblas/include/rocblas.h
/opt/rocm/rocblas/include/rocblas_bfloat16.h
/opt/rocm/rocblas/include/rocblas-export.h
/opt/rocm/rocblas/include/rocblas-complex-types.h
/opt/rocm/rocblas/include/rocblas_module.f90
/opt/rocm/rocblas/include/rocblas-exported-proto.hpp
sudo apt install kmod
can solve the lsmod warning in docker.
And I cannot find how to generate the Tensile image for gfx1010 under rocBLAS. Maybe you could recompile rocBLAS with BUILD_TENSILE_HOST=false. It will skip the Tensile image.
Actually the rocm didnot support gfx1010(nav10) offcially, so I cannot guarentee we could run gfx1010 on ROCm, eventually, please refer these issues:
https://github.com/ROCmSoftwarePlatform/pytorch/issues/718 https://github.com/RadeonOpenCompute/ROCm/issues/887
@xuhuisheng I solved the lsmod problem however the issue still remained.
Thanks for the hint and links. I will look into it. Before I started to get TF running with the 5700xt I found some other github issue where they linked to this blog post
https://www.preining.info/blog/2020/05/switching-from-nvidia-to-amd-including-tensorflow/
and confirmed it would work. So it seems some people get it running with the 5700xt. I already tried to reproduce the steps there but I wasn't successful.
Also tried this approach here https://github.com/RadeonOpenCompute/ROCm/issues/887#issuecomment-669717748 and wasn't able to reproduce it either.
@reinka I am afraid we had read this blog already, unfortrunately, the auther claimed that he met a segment fault later in the comment.
Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?
Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?
It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released).
I would appreciate a flag that allows me to use what works even if not everything and not tested instead of not being able to do anything at all on new GPUs.
On Fri, Oct 2, 2020 at 11:56 PM oleid notifications@github.com wrote:
Same problem on Ubuntu 20.04 with gfx1012. Is it just missing it in the list of supported GPUs?
It would seem that GPU is not fully supported, yet. I'd expect more to come in the next versions (before CNDA is released).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/1106#issuecomment-703057842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPIAOOOFXVRPFMCZPABIDSI3DLRANCNFSM4Q434RNQ .
@o8ruza8o which version of rocm do you use?By rigtorps reseaching, need rocm-3.7 to support gfx10xx.
gfx1012 is more complex, tensile only support gfx1010 and gfx1011, you may have to copy related Kernel.koso too.
And I had two ideas for it. first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE please refer this issue https://github.com/ROCmSoftwarePlatform/pytorch/issues/718#issuecomment-701174549
I am running rocm 3.8.0. My kernel is 5.7.19. My GPU is gfx1012.
On Sat, Oct 3, 2020 at 4:21 PM Xu Huisheng notifications@github.com wrote:
@o8ruza8o https://github.com/o8ruza8o which version of rocm do you use? since rigtorp reseaching, need rocm-3.7 to support gfx10xx.
And Ihad two ideas for it. first is copy /opt/rocm/lib/TensileLibrary_gfx900.co to TensileLibrary_gfx1012.co second is rebuild rocBLAS with BUILD_TENSILE_HOST=FALSE please refer this issue
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/1106#issuecomment-703176450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPIAJ4FNPWX5CA5NZ3CQ3SI6WW5ANCNFSM4Q434RNQ .
I have 5700xt I tried every possible method mentioned to get over this issue, nothing helped. _```
import tensorflow as tf tf.add(1,2) 2020-10-09 00:05:00.599858: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so /src/external/hip-on-vdi/rocclr/hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") Aborted (core dumped)
There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November. https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10
There is a new branch for gfx10 on rocBLAS, seems will release with ROCm-3.10, Maybe later of November. https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/gfx10
I'm curious whether the gfx10
branch also covers chipsets other than gfx1030, because it seems that only gfx1030 has been added, see:
https://github.com/ROCmSoftwarePlatform/rocBLAS/commit/8cd7bf043c6d97dbd485b163393e2c52bf3dfd5d
And also in other rocm packages, e.g.: https://github.com/ROCmSoftwarePlatform/rccl/commit/9f20b00548469f751eab6efc04686c51d6ebd47d
@da-phil So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0. Only hope the patch for RDNA2 can use to RDNA1 without big modifications.
@da-phil So I am afraid AMD will support RDNA2 offically, and drop supporting for RDNA1. Maybe ROCm-4.0. Only hope the patch for RDNA2 can use to RDNA1 without big modifications.
I wonder why the new RDNA2 is even categorized within gfx10, there must be some similarities in the way they work :thinking:
Off-topic question: do you or anybody else know any other recent AMD radeon GPU other than gfx803, gfx900, gfx906 and gfx908 which proved to work well with rocm and therefore tensorflow & pytorch? If that's the case I'd replace my new RX 5700XT by another AMD GPU right away. I like AMDs new open-source policy and don't want to go back to nvidia...
import tensorflow as tf x = tf.variable(2) Traceback (most recent call last): File "
", line 1, in AttributeError: module 'tensorflow' has no attribute 'variable' x = tf.Variable(2) 2020-11-20 13:14:26.164093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libamdhip64.so /src/external/hip-on-vdi/rocclr/hip_code_object.cpp:120: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!")
I am also having the same problem. Ubuntu 20.04 RX590 rocm3.9
Has anyone find any solution?
@iamsanjaymalakar please see this issue https://github.com/RadeonOpenCompute/ROCm/issues/1269
@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269
I am not sure I understood the solution correctly. I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists. I think i may be missing something.
@iamsanjaymalakar I wrote a doc for gfx803 issues. https://github.com/xuhuisheng/rocm-build/blob/develop/docs/gfx803.md
I am currently at the same point.
Ubuntu 18.04 RX 5500 XT
No idea, how to use the workaround.
@Doev RX 55000 XT didnot supported offcially. https://github.com/RadeonOpenCompute/ROCm/issues/1306
@iamsanjaymalakar please see this issue RadeonOpenCompute/ROCm#1269
I am not sure I understood the solution correctly. I clone the rocSPARSE git repo (https://github.com/ROCmSoftwarePlatform/rocSPARSE) and checked the CMakeList. There is AMDGPU_TARGETS set to gfx803. I build and installed rocSPARSE from git but the problem still exists. I think i may be missing something.
I am getting the similar error. I have checked the AMDGPU_TARGETS for same library i.e. rocSPARSE and it correctly mentions the GPU I have which is gfx906.
navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.
navi 10, or gfx10 chips are not officially supported by ROCm, here. There is nothing we can do without ROCm support.
Is there any idea how long it will take for support to come?
@RobertKillick That would be a question to ROCm guys. Once they have the infrastructure ready, it is trivial to add TF support for it.
Has anyone had any luck getting tensorflow-rocm running on a gfx1030 device?
UPDATE: I was able to get things running on a gfx1030 device building tf from source, I couldn't get available binaries to run.
GPU: 5700xt
When using the following Docker image:
with ROCm installed on the Docker host as explained here: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
I get the following error when executing TensorFlow ops:
and the Python console dies. I started the container with the alias mentioned in the corresponding Docker registry: https://hub.docker.com/r/rocm/tensorflow
I get the same error when I try to run tensorflow ops on the host.
Googling this issue yields only a handful of results so I feel like I might have some misconfiguration but I cannot figure out what it is.