Open Mushoz opened 1 year ago
Are there still people who are waiting for 7900XTX support? Though the performance is still a bit poor, TensorFlow-upstream now runs when built on the latest ROCm release. I was looking into the status of ROCm support for 7900XTX and found a few issues opened by different people and wanted to link all to the issue I opened in MIOpen repo. Though there has not been any confirmation from the developer, I think the performance issues are due to insufficient optimization of MIOpen. https://github.com/ROCmSoftwarePlatform/MIOpen/issues/2342
I am getting the following error with the latest release (tensorflow-rocm 2.13.0.570).
2023-12-17 19:48:20.262228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2015] Ignoring visible gpu device (device: 0, name: , pci bus id: 0000:2d:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
@vampireLibrarianMonk AFAIK, you need to build TF yourself if you want to use 7900XTX with it. Hopefully, with the release of ROCm6.0, they will release updated frameworks that support 7900XTX out of the box soon. This should give you some idea on how to build it locally if you are interested. To build it on ROCm6, you need to change.
sed -i 's/5.7.0/5.7.1/g' build_rocm_python3
to
sed -i 's/5.7.0/6.0.0/g' build_rocm_python3
Also, it consumes a lot of memory since it will launch 1 compile process per logical processor you have and each process can consume more than 1GB on average. When I'm building it on my smaller machine (Ryzen 3900X + 32GB), I just disable SMT so that it will only launch 12 concurrent compile processes.
Issue Type
Bug
Tensorflow Version
Tensorflow-rocm v2.11.0-3797-gfe65ef3bbcf 2.11.0
rocm Version
5.4.1
Custom Code
Yes
OS Platform and Distribution
Archlinux: Kernel 6.1.1
Python version
3.10
GPU model and memory
7900 XTX 24GB
Current Behaviour?
I am not entirely sure whether this is an upstream (ROCM) issue, or with Tensorflow-rocm specifically, so I am reporting it to both repo's. A toy example refuses to run and dumps core. I would have expected it to train successfully.
Standalone code to reproduce the issue
Relevant log output