Closed berinaniesh closed 2 months ago
Apparently, this is being fixed in https://github.com/ROCm/tensorflow-upstream/pull/2434/commits/632e25544c6881a8acf798827d3699281795fbf8 and a new release has not been made. Another issue is open for a release to be made. https://github.com/ROCm/tensorflow-upstream/issues/2487. So closing this.
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
binary
TensorFlow version
2.14
Custom code
Yes
OS platform and distribution
Docker latest rocm
Mobile device
No response
Python version
3.9.18
Bazel version
No response
GCC/compiler version
9.4.0
CUDA/cuDNN version
ROCM 6.0, runtime version 1.1
GPU model and memory
Radeon 6800m (gfx1031, converted to gfx1030 with HSA_OVERRIDE, has worked in previous versions))
Current behavior?
I'm using Tensorflow from the official ROCM tensorflow docker image (latest,
tf.__version__=2.14
). I have a Radeon 6800m (Asus G513QY laptop). The GPU is gfx1031 (which is unsupported), but I can set the variable ofHSA_OVERRIDE_GFX_VERSION=10.3.0
to change the GPU to gfx1030. It has worked well in the past. The newer version of the docker image detects my GPU as gfx1030, but doesn't use it because it is not present in the list of supported GPUs.But gfx1030 is present in the list of supported GPUs, but there is a space missing between gfx1030 and gfx1100 and both words combine as gfx1030gfx1100 and fails to acknowledge that gfx1030 is a valid GPU. The output can be found below.
2024-04-08 08:21:28.484960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 0, name: AMD Radeon RX 6800M, pci bus id: 0000:03:00.0) with AMDGPU version : gfx1030. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
I searched for the string of
gfx1030gfx1100
in this repo as well as the rocm docker repo, but couldn't find any. Can someone fix this?Standalone code to reproduce the issue
Relevant log output