Xilinx / DPU-PYNQ

DPU on PYNQ
Apache License 2.0
199 stars 68 forks source link

Compilation Error and other issues on running train_mnist_model.ipynb #82

Closed axt7568 closed 1 year ago

axt7568 commented 2 years ago

A quantized MNIST CNN .h5 model was successfully generated using the code provided in the train_mnist_model.ipynb. The Vitis AI Compiler (vai_c_teensroflow2) tool was used to compile the model. This process however was interrupted due to an error which is described in the image below.

compilation_error

I wanted to add that the Vitis AI Docker container (tried both latest and 1.4.916) was used to compile the saved .h5 quantized model (tf2_mnist_classifier_qunatized.h5). However, the .h5 model was generated by running in my own work environment as opposed to the Vitis ai docker container. This is because I had issues importing TensorFlow 2.x in the Vitis-ai-tensorflow2 conda environment as shown below.

tensorflow_import_error

Some sources online state that since I’m using a virtual machine to run the docker in a conda environment, importing TensorFlow versions 2 or greater is likely to error out due to Advanced Vector Extensions (AVX) issues due to running on a virtual machine.

On the other hand, TensorFlow version 1.X works perfectly, however, this version of TensorFlow does not include the TensorFlow model optimization package without which the Vitis quantization tool cannot be imported. This error is shown in the image below.

quantization_error

I would be thankful if you could help me with two main questions. One, why does the Vitis compiler error out? Is it because I’m using the latest versions of Keras and TensorFlow? Second, Is it not possible to run the docker containers in the TensorFlow 2 conda environment using virtual machines?

Ideally, we would like to quantize and compile the model in order to target it on our ZCU104 Board for Edge Inference. I can also email the quantized .h5 file if it is required for the issue to be reproduced.

Thank you very much for the help.

skalade commented 2 years ago

Hi there,

If you are ok with using Tensorflow 1, you could check out the notebook in the 1.3.2 tag of this repo. The host notebook used to use Tensorflow 1, and requires you to install keras separately (readme).

On your questions -- you might get a better answer on the Vitis AI issue tracker. It's possible that the issue is due to a version mismatch -- you could try to make sure your local Tensorflow install matches that of the Vitis AI docker version.

Thanks Shawn

axt7568 commented 2 years ago

Hello Shawn,

Thanks for the help.

Looks like AVX and FMA were disabled since I was running Ubuntu using VirtualBox. The empty output resulting from the grep command is shown below.

1

I was able to resolve this by switching to VMware as VMware does pass through the CPU directly to the guest OS, so it isn't emulating features. The updated output from the grep command is shown below. This confirms that both AVX and FMA are now enabled.

grep avx2 /proc/cpuinfo

2

grep fma /proc/cpuinfo

3

As per your suggestion, I used the tag 1.3.2 where I installed the docker version 1.3.411 which had TensorFlow 1.15.2 installed. I also installed Keras version 2.2.5 separately. I was able to train, test, and save the model. However, I had trouble freezing the model (TensorFlow graph). I have included screenshots of the same below and would request you to take a look at them.

Docker v1.3.411 Results

4

5

7

6

I also tried using Docker version 1.4.916 which is being used in the master branch. Here, I was using TensorFlow version 2.3.0 and was able to successfully train, test, save and even quantize the model. However, I had an error at the last step when compiling the quantized model for my board. I have also included screenshots of the same below.

Docker v1.4.916 Results

8

9

10

Lastly, I was wondering what the Vitis-AI issue tracker is and how I can use it. Also, I would be thankful if you could provide me with any suggestions/advice on how I can resolve the issue along with the next steps to take.

Thanks, Arjun

skalade commented 2 years ago

Hi Arjun,

Could you clarify if are you compiling the exact MNIST model from the example notebook or your own model? If you can successfully compile simpler models and not your specific one, there might be some layer incompatibility (you can look up supported layers and parameters in the Vitis AI user guide). If it's happening with all models then there might be some other issue with your setup.

The issue tracker I was referring to is the one on the Vitis AI github repo, for more general Vitis AI API and docker issues.

Thanks Shawn

skalade commented 1 year ago

Closing this issue due to lack of activity..