Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.48k stars 627 forks source link

GPU tensorflow error: TypeError: bases must be types #824

Open haipnh opened 2 years ago

haipnh commented 2 years ago

Hi,

I'm trying to have a clean install Vitis-AI on my machine.

Before that, I executed:

# Remove nvidia-driver
sudo apt-get purge --remove nvidia-driver-*  
sudo apt-get autoremove 
sudo apt-get update 
sudo apt-get install nvidia-driver-470  
sudo apt-get install -y nvidia-docker2 nvidia-container-toolkit 

# Remove docker
sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-compose-plugin 
sudo rm -rf /var/lib/docker 
sudo rm -rf /var/lib/containerd 
sudo reboot now 

Then I re-installed docker and performed post-install (https://docs.docker.com/engine/install/ubuntu/)

I checked out the commit ba890549 on master branch, built the GPU image and tried to test the tensorflow2. But it showed error:

Vitis-AI /workspace > conda activate vitis-ai-tensorflow2
(vitis-ai-tensorflow2) Vitis-AI /workspace > python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 40, in <module>
    from tensorflow.python.eager import context
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 32, in <module>
    from tensorflow.core.framework import function_pb2
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/tensorflow/core/framework/function_pb2.py", line 7, in <module>
    from google.protobuf import descriptor as _descriptor
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 40, in <module>
    from google.protobuf.internal import api_implementation
  File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/google/protobuf/internal/api_implementation.py", line 104, in <module>
    from google.protobuf.pyext import _message
TypeError: bases must be types

In the other hand, the GPU-enabled pytorch can be executed.

Vitis-AI /workspace > conda activate vitis-ai-optimizer_pytorch
(vitis-ai-optimizer_pytorch) Vitis-AI /workspace > python -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.cuda.device_count()); print(torch.cuda.get_device_name(0))"
11.0
True
1
NVIDIA GeForce GTX 1080 Ti
JaviMota commented 2 years ago

Hi

Same problem here. Downgrading protobuf library solves the problem for me, Nevertheless there is tensorflow 2.6 installed with cudatoolkit = 11.5 and cudnn = 8.2. According to tensorflow documentation tensorflow 2.6 has to be run with cuda 11.2 and cudnn = 8.1 https://www.tensorflow.org/install/source#gpu Downgrading to those versions raises a lot of conflicts. How can we solve this issue?

PD: There is also incompatibility issues between some libraries version with tensorflow 2.6 that needs to be fixed installing the appropriate version.

image

Thanks, Javi