XLA compilation not working on RNN example rocm 5.4.3 MI100

Epliz commented 1 year ago

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

binary

Tensorflow Version

tf-rocm 2.11

Custom Code

No

OS Platform and Distribution

Ubuntu 22.04.2 LTS

Mobile device

No response

Python version

3.10.6

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

rocm 5.4.3

GPU model and memory

MI100

Current Behaviour?

Getting the current error:

2023-03-25 10:08:12.600064: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7fa7f8102fa0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2023-03-25 10:08:12.600101: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): AMD Instinct MI100, AMDGPU ISA version: gfx908:sramecc+:xnack-
2023-03-25 10:08:12.623830: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-03-25 10:08:12.724333: E tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:289] bitcode module is required by this HLO module but was not found at ./opencl.bc
2023-03-25 10:08:12.725250: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-03-25 10:08:12.725363: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: bitcode module not found at ./opencl.bc
2023-03-25 10:08:12.749220: E tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:289] bitcode module is required by this HLO module but was not found at ./opencl.bc
2023-03-25 10:08:12.749567: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: bitcode module not found at ./opencl.bc
2023-03-25 10:08:12.782255: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
Traceback (most recent call last):
  File "/home/me/git/ml/textgen_rnn/./rnn.py", line 148, in <module>
    history = model.fit(dataset, epochs=EPOCHS, batch_size=BATCH_SIZE)
  File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_5' defined at (most recent call last):
    File "/home/me/git/ml/textgen_rnn/./rnn.py", line 148, in <module>
      history = model.fit(dataset, epochs=EPOCHS, batch_size=BATCH_SIZE)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/home/me/git/ml/venv-gpu/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_5'
bitcode module not found at ./opencl.bc
     [[{{node StatefulPartitionedCall_5}}]] [Op:__inference_train_function_2592]


### Standalone code to reproduce the issue

```shell
code from https://www.tensorflow.org/text/tutorials/text_generation

Relevant log output

No response

Epliz commented 1 year ago

same symptoms as https://github.com/RadeonOpenCompute/ROCm/issues/1796

Epliz commented 1 year ago

using the solution from there to set ROCM_PATH worked. Please make the env variable to be automatically set instead of making users having to figure it out by themselves.

ROCm / tensorflow-upstream

XLA compilation not working on RNN example rocm 5.4.3 MI100 #2026

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Relevant log output