ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
688 stars 94 forks source link

Tensorflow 2.0 AMD support #362

Closed Cvikli closed 5 years ago

Cvikli commented 5 years ago

I would be curious if Tensorflow 2.0 works with AMD Radeon VII?

Also, if it is available, are there any benchmark comparison with 2080Ti on some standard network to see if we should invest in Radeon VII clusters?

sunway513 commented 5 years ago

Hi @Cvikli , we are finalizing the 2.0-alpha docker image and will be available soon, please stay tuned.

sunway513 commented 5 years ago

Hi @Cvikli , we've pushed out the preview build docker image for TF2.0-alpha0: rocm/tensorflow:tf2.0-alpha0-preview Please help review it and let us know your feedback :-) Here's the link to our dockerhub repo: https://cloud.docker.com/u/rocm/repository/docker/rocm/tensorflow/general

Cvikli commented 5 years ago

Great! Just ordered our first card for testing. :) If the delivery and tests go well, then I will be back with results by April 2.

Thank you for the fast work! I am really excited about it!

dagamayank commented 5 years ago

Please open a new issue if bugs are found with the 2.0 docker.

Cvikli commented 5 years ago

Sorry for opening the thread but I own you guys with a lot!

The RADEON VII's performance is crazy with tensorflow 2.0a. In our tests, we reached close to the same speed like our 2080ti(about 10-15% less)! But the Radeon VII has more memory which was a bottleneck in our case. On this price this videocard has the best value to do machine learning we think that in our company!

We are glad to open our eyes towards AMD products, we are buying our first configuration which is 40% cheaper and as we measured capable to perform better in our scenario than our well optimised server configuration.

Thank you for all the work!

briansp2020 commented 5 years ago

@Cvikli

We are glad to open our eyes towards AMD products, we are buying our first configuration which is 40% cheaper and as we measured capable to perform better in our scenario than our well optimised server configuration.

Could you give a bit more detail? How much faster is Radeon VII for your application? What type of mode are you running (CNN/RNN/GAN/etc.)? What processor are you running?

Just curious.

sunway513 commented 5 years ago

Thank you @Cvikli , great to hear that your experiment went well and you are going to invest more on ROCm and AMD GPUs!

Cvikli commented 5 years ago

The system is something like this:

The result with RNN networks on 1 Radeon VII and 1080ti was close to the same.

Now after we switched over to 4 Radeon VII, we face two big scaling issue on convolutional networks.

  1. One of our computer has 4 AMD Radeon VII, but we can't have more than one calculation (without this error below) on the system if we would use two separate GPU card. The second calculation that is running on the other GPU writes this:
    2019-05-12 15:28:04.632396: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 14.95G (16049923584 bytes) from device: hipError_t(1002)
    2019-05-12 15:28:04.632456: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 13.45G (14444931072 bytes) from device: hipError_t(1002)
    2019-05-12 15:28:04.632475: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 12.11G (13000437760 bytes) from device: hipError_t(1002)
    ... many lines like this
    2019-05-12 15:36:58.756188: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 310.35M (325421568 bytes) from device: hipError_t(1002)
    2019-05-12 15:36:58.756226: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 279.31M (292879616 bytes) from device: hipError_t(1002)
    2019-05-12 15:36:58.756252: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 251.38M (263591680 bytes) from device: hipError_t(1002)
    2019-05-12 15:36:58.756279: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 226.24M (237232640 bytes) from device: hipError_t(1002)
    2019-05-12 15:36:58.756304: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 203.62M (213509376 bytes) from device: hipError_t(1002)
    2019-05-12 15:36:58.756323: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 183.26M (192158464 bytes) from device: hipError_t(1002)
    2019-05-12 15:36:58.756343: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 164.93M (172942848 bytes) from device: hipError_t(1002)
    2019-05-12 15:37:01.337949: E tensorflow/stream_executor/rocm/rocm_driver.cc:493] failed to memset memory: HIP_ERROR_InvalidValue
    Segmentation fault (core dumped)

    We are pretty sure things should work, because it was working with NVidia 1080ti. However inspite of it writes, that it failed to allocate the memory, the whole program just start and somehow running normally I think.

Can this happen because of the docker image, we can't use separate GPUs for different runs?

  1. Comparing convolutional performance the 4AMD and 4Nvidia, difference got really huge because of cuDNN for Nvidia cards. We can get more than 10x performance from the 1080Ti than the Radeon VII card. We find this difference in speed a little too big at image recognition cuDNN, I can't believe that this should happen and the hardware shouldn't be able to achieve the same.

What do you guys think about this? Is this normal that we get 10x slower speed when it comes to cudNN? (For me cuDNN sounds totally a software with better arithmetic operations I guess, is it possible to improve on this?)

sunway513 commented 5 years ago

Hi @Cvikli , let's step back a bit and look at your system configuration:

  • 4x SAPPHIRE Radeon VII
  • 2x G.SKILL FlareX 64GB
  • 1x Thermaltake Toughpower 1500W Gold

The typical gold workstation power supply would run at 87% efficiency at full load, therefore it can supposedly power up to 1307W.
TR 2950x TDP is measured at 180W, Radeon VII TDP is 300W, but the peak power consumption can go up to 321.8W (according to third-party measurement here). Considering the other components on your workstation, the current 1500W is not sufficient for your system at full load. We'd recommend you to go for 1800W PSU or dual 1000W PSU for your system provide sufficient juices for 4 Radeon VII GPUs.

2019-05-12 15:28:04.632396: E tensorflow/stream_executor/rocm/rocm_driver.cc:629] failed to allocate 14.95G (16049923584 bytes) from device: hipError_t(1002)

The above error message indicates the target GPU device memory has already been allocated by the other processes. There're a couple of solutions to expose only selected GPUs to the user process:

  1. Use HIP_VISIBLE_DEVICES environment variable to select the target GPUs for the process from the HIP level. e.g. use the following to select the first GPU:
  1. Use ROCR_VISIBLE_DEVICES environment variable to select the target GPUs from the ROCr (ROCm user-bit driver) level. e.g. the following to select the first GPU:
    • export ROCR_VISIBLE_DEVICES=0
  2. Pass selected GPU driver interfaces (/dev/dri/render#) )to Docker container. e.g. use the following docker run command option to select the first GPU:
    • sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri/renderD128 --group-add video Note you show see the following four interfaces for your 4xRadeon VII system: $ ls /dev/dri/render* /dev/dri/renderD128 /dev/dri/renderD129 /dev/dri/renderD130 /dev/dri/renderD131

We recommend approach #3, as that would isolate the GPUs at a relatively lower level of the ROCm stack.

For your concern on mGPU performance, could you provide the exact commands to reproduce your observations?

Just FYI, we have been actively running regressions tests for single node multi-GPU performance, and there's no mGPU performance regression issue reported for TF1.13 on ROCm2.4 release. After you can resolve the concern on the power supply, for tf_cnn_benchmarks resnet50 as an example, you should be able to see near-linear scalability on FP32 using the following command with 4 GPUs: TF_ROCM_FUSION_ENABLE=1 python3 tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --optimizer=sgd --num_batches=100 --variable_update=replicated --nodistortions --gpu_thread_mode=gpu_shared --num_gpus=4 --all_reduce_spec=pscpu --print_training_accuracy=True --display_every=10

Cvikli commented 5 years ago

hank you for the 3 different ways to manage visible devices. The second solution (with export ROCR_VISIBLE_DEVICES=0) WORKED like a charm for us! Interestingly the third solution didn't restrict the available GPU devices in the docker container.

Ran some test on TF2.0 on ROCm2.4 and performance is still a lot lower than what an Nvidia 1080Ti can provide benchmarking on MobileNetv2, what bothers us yet a little. To get some direction for the TF2.0 ROCm2.4, I thought I share these logs. Before the calculations would start for a MobileNetV2:

2019-05-13 18:48:40.653042: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so
2019-05-13 18:48:40.683726: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so
2019-05-13 18:48:44.998231: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:45.094061: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter
... 2x14 lines like this with Backward-Data and Backward-Filter
2019-05-13 18:48:48.854030: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:48.945517: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter
2019-05-13 18:48:49.207930: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data
2019-05-13 18:48:49.295100: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter
2019-05-13 18:48:50.639570: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter

So I pretty much feel like we are running some operations 19 times, which leads to 10-15x speed loss, but it is only a guess. If I can help in any other way let me know.

PS.: on TF2.0 ROCm2.4, I couldn't run the tf_cnn_benchmarks.py because missing tensorflow.contrib.

sunway513 commented 5 years ago

Hi @Cvikli , glad the ROCr env var worked for you! For approach #3, if you run ROCr level utils you should see the restricted access (e.g. /opt/rocm/bin/rocminfo); however, since rocm_smi uses different approaches to query the GPU status, you can still see all the GPUs using rocm_smi even you pass limited GPU device interfaces to docker container. Adding @jlgreathouse @y2kenny for awareness.

2019-05-13 18:48:44.998231: I tensorflow/core/kernels/conv_grad_input_ops.cc:997] running auto-tune for Backward-Data 2019-05-13 18:48:45.094061: I tensorflow/core/kernels/conv_grad_filter_ops.cc:886] running auto-tune for Backward-Filter

The above logs indicate the time spent there was actually for MIOpen to compile kernels, please refer to my previous comment here for reference. Those are one-time effort, for the latter runs MIOpen will just pick the cached kernels under ~/.cache/miopen instead of compiling those again. If you have been using docker containers for the dev work, you can consider committing the docker container with MIOpen cache compiled so you can reuse those for later reference.

sunway513 commented 5 years ago

Besides, if your application is built on TF1.x api, you might use the following TF1.13 release instead of using TF2.0 branch built with --config=v1: rocm/tensorflow:rocm2.4-tf1.13-python3

Cvikli commented 5 years ago

We ported our code from tf2.0 to tf1.13 and run the MobileNetV2 implementation from tf.keras.applications on the configuration you suggested (TF1.13 on ROCm2.4 release), and we still see NO improvement in speed. Nvidia 1080Ti still performs 5-10x faster. I don't know if it is, because cudnn or cuda is not availabe for Radeon cards, but this performance difference is pretty high.

sunway513 commented 5 years ago

Hi @Cvikli , could you provide the exact steps to repro your observation? FYI, Tensorflow-ROCm deploys the ROCm MIOpen library to accelerate the DL workloads, the repo is here: https://github.com/ROCmSoftwarePlatform/MIOpen

quantuminformation commented 5 years ago

Anyone tested with the latest Macbook pros?

quocdat32461997 commented 5 years ago

I run into the error "failed to allocate 14.95G (16049923584 bytes) from device: hipError_t(1002)" as above. System info: Intel® Xeon(R) CPU E5-2630 v2 @ 2.60GHz × 12 Radeon VII 1500 W PSU ROCm installed with Tensorflow-rocm 1.13.1 (through pip3)

I have not tried install tensorflow-rocm through docker.

Any help?

sunway513 commented 5 years ago

Hi @quocdat32461997 , can you try to set the following environment variables: export HIP_HIDDEN_FREE_MEM=500 If it still fails, please create a new issue and provide more complete logs.

quocdat32461997 commented 5 years ago

Problem solved by re-installing ROCm and Tensorflow-rocm. Proabably I did not install the ROCm properly. Thanks a lot.

Cvikli commented 5 years ago

Hey there! I would like to know if there will be a new docker image with tensorflow==2.0.0b installed, because now still only alpha version is available for tf2.0. By the way we ran the https://github.com/lambdal/lambda-tensorflow-benchmark tests, and the difference between an Nvidia and the Radeon cards are less then stated above. If you are interested I can share the tests results here.

sunway513 commented 5 years ago

Hi @Cvikli , we are preparing the TF2.0 beta release, it's currently under QA test coverage. We'll update here after the new docker image is available.

Cvikli commented 5 years ago

You guys, you are crazy! I love it! :) Thank you for this speed!

satvikpendem commented 5 years ago

Looks like the link at the beginning of the thread redirects to https://hub.docker.com, here's the link I'm using to track releases: https://hub.docker.com/r/rocm/tensorflow/tags

sunway513 commented 5 years ago

Hi @Cvikli , we have published the docker container for TF-ROCm 2.0 Beta1. Please kindly check it and let us know if you have any questions: rocm/tensorflow:rocm2.5-tf2.0-beta1-config-v2

ghost commented 5 years ago

Hi everyone, when I run the rocm/tensorflow:rocm2.5-tf2.0-beta1-config-v2 docker container or any other container with tensorflow 2.0, trying to import tensorflow results in following error: >>> import tensorflow as tf Illegal instruction (core dumped)

I am using a rx 480 with rocm 2.5 and rocm with tensorflow 1.13 works fine.

sunway513 commented 5 years ago

Hi @moonshine502 , I've tried a couple of samples using the rocm2.5-tf2.0-beta1-config-v2 docker image on my GFX803 node, those are working fine. Could you provide the steps to reproduce your issue?

ghost commented 5 years ago

Hi @sunway513, thank you for your response.

Hardware: Intel Celeron G3900 (Skylake), AMD Radeon RX 480 (gfx803) Software:

Issue: Executing python3 -c "import tensorflow as tf" inside the docker results in python3 -c "import tensorflow as tf" Illegal instruction (core dumped)

I am guessing that this error is caused by the cpu not being compatible with the new tensorflow version. Could this be the case?

dundir commented 5 years ago

@moonshine502 I'm running almost the exact same system setup and its able to load and train for me.

The only difference appears to be the CPU, or possibly the card. I'm using a Ryzen 5 2400G; everything else looks near the same. I'm using a RX560 14cu, which registers in linux as an RX480 (gfx803), ROCM 2.5.27.

I ran through all the steps for training a mnist dataset at the link below to confirm tf2.0 was actually working, the accuracy for the evaluation wasn't the best (~87.7%) vs (98%) but it was able to compute.

https://www.tensorflow.org/beta/tutorials/quickstart/beginner

Edit: included more info.

ghost commented 5 years ago

Hi @dundir, @sunway513,

I am now pretty sure that the cause of the problem is my cpu which does not support avx instructions. It seems that previous versions of tensorflow with rocm were compiled without avx, because they work on my machine. So I may try to build tensorflow 2.0 without avx or get a new cpu.

Thank you for your help.

dundir commented 5 years ago

@sunway513 It looks like there may be an rocm related issue with the accuracy for training a basic mnist model.

Running this code: here GPU passthru stdout: here

The docker container was set up with the same passthru options as 1.13, the resulting accuracy diverged to 87% accuracy from the baseline of 97%, and the overall computation time diverged 44s of training for 5 epochs, from the baseline of 20s (nopassthru).

No dev passthru stdout: here

dundir commented 5 years ago

@sunway513 Looks like the accuracy issue I previously mentioned regarding mnist was resolved with the latest tf2.0 docker image (rocm/tensorflow:rocm2.6-tf2.0-config-v2-dev).

Thanks, and much appreciated. You guys are doing an awesome job.

bionicles commented 5 years ago

Memory being the bottleneck, can we do bfloat16 and int8, float8, float16? Just curious

salmanulhaq commented 4 years ago

We ported our code from tf2.0 to tf1.13 and run the MobileNetV2 implementation from tf.keras.applications on the configuration you suggested (TF1.13 on ROCm2.4 release), and we still see NO improvement in speed. Nvidia 1080Ti still performs 5-10x faster. I don't know if it is, because cudnn or cuda is not availabe for Radeon cards, but this performance difference is pretty high.

cuDNN is not purely software play and is backed by actual silicon (dedicated tensor cores for MAD ops) which boosts half-precision performance. I'll need to check if Radeon VII has dedicated tensor cores as well. Also, nvidia won't automatically optimize code to make use of tensor cores, that has to be done w/ using cuDNN extensions

michaelklachko commented 4 years ago

@salmanulhaq 1080Ti has no tensor cores.

raxbits commented 4 years ago

We ported our code from tf2.0 to tf1.13 and run the MobileNetV2 implementation from tf.keras.applications on the configuration you suggested (TF1.13 on ROCm2.4 release), and we still see NO improvement in speed. Nvidia 1080Ti still performs 5-10x faster. I don't know if it is, because cudnn or cuda is not availabe for Radeon cards, but this performance difference is pretty high.

cuDNN is not purely software play and is backed by actual silicon (dedicated tensor cores for MAD ops) which boosts half-precision performance. I'll need to check if Radeon VII has dedicated tensor cores as well. Also, nvidia won't automatically optimize code to make use of tensor cores, that has to be done w/ using cuDNN extensions

do u have a referece for hardware being involved in CUDNN?

CUDNN afaik is pure software play with optimization and what not , what u may be referring to is TENSOR cores which was added to packaged on Volta and carried to Turing silicons.

roschler commented 4 years ago

Anybody tried TF 2.0 with a Radeon RX 580, with 8GB RAM? Does it work? If it does, has anybody tried running multiple cards in parallel?

I have one of the first generation Nvidia Titan X cards (pre-pascal). I'm finally giving up on it. It can only run CUDA drivers from a long time ago, from the year the card first was produced. Anything newer, I've tried them all, and the card won't initialize (i.e. - O/S rejects it at the device level). Very sad about this since I pad a ton for it, but it's time to move on.

himanshugoel2797 commented 4 years ago

It ought to work but I'm not convinced that there's a point in running multiple 580s on a single training task. I don't think they'd be fast enough to gain a meaningful speedup (I didn't test rocm, but in a rendering task between a VII and a 580, it was faster to just use the VII than to have them both work together).

kuabhish commented 4 years ago

Anyone tested with the latest Macbook pros?

Can anyone reply to @QuantumInformation question please?

quantuminformation commented 4 years ago

I've now upgraded to the new MBP 16, but not used TFJS for a while, might get into py soon.

sunway513 commented 4 years ago

Hi @QuantumInformation @kuabhish , please refer to the following doc for ROCm support coverage over OSes: https://github.com/RadeonOpenCompute/ROCm#deploying-rocm There's another thread discussing the Mac support on main ROCm repo: https://github.com/RadeonOpenCompute/ROCm/issues/262

sumannelli commented 4 years ago

Hi Cvikli,

I am having radeon-vii but not able to configure with tensorflow. Please guide me. I was struggling to configure this for more than 15 days. Can I use the my gpu without docker ? Can i use the tensorflow 1.x with gpu. I had installed the rocm but still gpu is bot responding while training my model.

My system config: OS: Ubuntu 18.04 Thanks Suman

sunway513 commented 4 years ago

Hi @sumannelli , did you follow the following instructions to install TF? https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md

And certainly, you can use your GPU without docker, that's just a matter of deployment approach -- using docker would likely help you save some time config the user bit environment with ROCm.

sumannelli-Ib commented 4 years ago

HI Sunway513, Thanks for the reply. I can able to use the AMD radeon Vii with Tensorflow2.1 but while my model is training, it is using only 3% of memory only. OS: ubuntu 18.04 kernel: 5.3 rocm:3.1.3 tensorlow:2.1 If I am using any incompatible version please let me know. once again thanks for the quick reply. Thanks Suman Nelli

Sifatul22 commented 4 years ago

Hi, Guys My CPU specs are Ryzen 5 3600 and AMD Radeon RX 5500 XT Is there any way I could enable TensorFlow GPU using Rocm or other platforms? Please help me out

sunway513 commented 4 years ago

HI @Sifatul22 , your configuration should work. Please follow the document here to install ROCm and Tensorflow-rocm: https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md Let us know if you have questions, thanks.

briansp2020 commented 4 years ago

@sunway513 Is Navi now supported? Radeon RX 5500 XT is Navi, isn't it?

sunway513 commented 4 years ago

Hi @briansp2020 , Navi is not supported by ROCm yet, please refer to the following document for the GPU GPU list supported by ROCm: https://github.com/RadeonOpenCompute/ROCm#supported-gpus

sumannelli commented 4 years ago

Hi sunway513, I referred the link you provided to install the Rocm, it is installing with python 2.7. But I want to install with python 3.6. https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md Please suggest me on this. Thanks

sunway513 commented 4 years ago

Hi @sumannelli , in the same document, if you follow the steps to install python3 dependencies, depends on the default python3 version you have in your environment, you should be able to configure it correctly.

sumannelli commented 4 years ago

@Hi sunway513,

Thanks for the reply Now I can run tensorflow2 on AMD radeon Vii. But now I am using object detection api which support tensorflow1.15.0, when i installed thetensorflow-rocm==1.15.0 ,getting the error as" aceback (most recent call last): File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 2453, in from tensorflow.python.util import deprecation File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 25, in from tensorflow.python.platform import tf_logging as logging File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/platform/tf_logging.py", line 38, in from tensorflow.python.util.tf_export import tf_export File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_export.py", line 48, in from tensorflow.python.util import tf_decorator File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_decorator.py", line 64, in from tensorflow.python.util import tf_stack File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_stack.py", line 29, in from tensorflow.python import _tf_stack ImportError: /home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/_tf_stack.so: undefined symbol: PySlice_AdjustIndices

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "a.ipynb", line 1, in from tensorflow.keras.datasets import mnist File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/init.py", line 99, in from tensorflow_core import File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/init.py", line 28, in from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/init.py", line 50, in getattr module = self._load() File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/init.py", line 44, in _load module = _importlib.import_module(self.name) File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/init.py", line 49, in from tensorflow.python import pywrap_tensorflow File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 74, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 2453, in from tensorflow.python.util import deprecation File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 25, in from tensorflow.python.platform import tf_logging as logging File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/platform/tf_logging.py", line 38, in from tensorflow.python.util.tf_export import tf_export File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_export.py", line 48, in from tensorflow.python.util import tf_decorator File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_decorator.py", line 64, in from tensorflow.python.util import tf_stack File "/home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/util/tf_stack.py", line 29, in from tensorflow.python import _tf_stack ImportError: /home/ideabytes/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow_core/python/_tf_stack.so: undefined symbol: PySlice_AdjustIndices

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace above this error message when asking for help.

Thanks Suman Nelli

sumannelli-Ib commented 4 years ago

Hi sunway513, The Rocm 3.1 is not working with Tensorflow-rocm=1.15.0. Please provide the link or reference to download the Rocm 2.10 Note: when using the below command it is downloading Rocm 3.1. But I need 2.1

sudo apt install rocm-dkms My work has stopped because of this. kindly reply me.