ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
686 stars 93 forks source link

7900 XTX Fails to Run #1956

Open Mushoz opened 1 year ago

Mushoz commented 1 year ago

Issue Type

Bug

Tensorflow Version

Tensorflow-rocm v2.11.0-3797-gfe65ef3bbcf 2.11.0

rocm Version

5.4.1

Custom Code

Yes

OS Platform and Distribution

Archlinux: Kernel 6.1.1

Python version

3.10

GPU model and memory

7900 XTX 24GB

Current Behaviour?

I am not entirely sure whether this is an upstream (ROCM) issue, or with Tensorflow-rocm specifically, so I am reporting it to both repo's. A toy example refuses to run and dumps core. I would have expected it to train successfully.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

features = np.random.randn(10000,25)
targets = np.random.randn(10000)

model = tf.keras.Sequential([
     tf.keras.layers.Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.MeanSquaredError())

model.fit(x=features, y=targets)

Relevant log output

[jaap@Jaap-Desktop code]$ pipenv run python testNN.py
2022-12-24 11:18:37.178811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
python: /build/hsa-rocr/src/ROCR-Runtime-rocm-5.4.1/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char*, AssembleTarget, void*&, size_t&) const: Assertion `code_buf != NULL && "Code buffer allocation failed"' failed.
briansp2020 commented 1 year ago

Are there still people who are waiting for 7900XTX support? Though the performance is still a bit poor, TensorFlow-upstream now runs when built on the latest ROCm release. I was looking into the status of ROCm support for 7900XTX and found a few issues opened by different people and wanted to link all to the issue I opened in MIOpen repo. Though there has not been any confirmation from the developer, I think the performance issues are due to insufficient optimization of MIOpen. https://github.com/ROCmSoftwarePlatform/MIOpen/issues/2342

vampireLibrarianMonk commented 9 months ago

I am getting the following error with the latest release (tensorflow-rocm 2.13.0.570).

2023-12-17 19:48:20.262228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2015] Ignoring visible gpu device (device: 0, name: , pci bus id: 0000:2d:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.

briansp2020 commented 9 months ago

@vampireLibrarianMonk AFAIK, you need to build TF yourself if you want to use 7900XTX with it. Hopefully, with the release of ROCm6.0, they will release updated frameworks that support 7900XTX out of the box soon. This should give you some idea on how to build it locally if you are interested. To build it on ROCm6, you need to change.

sed -i 's/5.7.0/5.7.1/g' build_rocm_python3

to

sed -i 's/5.7.0/6.0.0/g' build_rocm_python3

Also, it consumes a lot of memory since it will launch 1 compile process per logical processor you have and each process can consume more than 1GB on average. When I'm building it on my smaller machine (Ryzen 3900X + 32GB), I just disable SMT so that it will only launch 12 concurrent compile processes.