Network output is NaN. Using XportDNN tool for quantization produced NaN scores.

bennihoffmann commented 6 years ago

Hello,

I'm having trouble using XportDNN tool for quantization of an VGG-16 custom model. We retrained the original VGG-16 Network with our own dataset. The Prototex file looks pretty similar to the one of the supplied ModelZoo (https://github.com/Xilinx/CHaiDNN/blob/master/docs/MODELZOO.md).

Executing the model in Caffe on our host machine works pretty well. On the Board all outputs are NaN. We tried different Hardware builds on our zcu104 board and GoogleNet and VGG-16 (both from ModelZoo) are working correctly on our setup as well on the binaries of the SD_Card/zcu104 folder.

We also tried to use the quantisation parameter supplied from the ModelZoo prototxt file by manual pasting it into our own prototxt, but still the same result. We finaly run out of ideas.

Did someone hat similar problems by porting their own model to CHaiDNN?

bennihoffmann commented 6 years ago

Here is a Link to my prototex file: https://www.dropbox.com/s/53ni0bmrv95k8cz/vgg16fc2xilinxQuand8.prototxt?dl=0

anilmartha commented 6 years ago

Hi @bennihoffmann,

Looks like XportDNN has executed successfully as it produced CHaiDNN compatible prototxt. Could you please share the command that you have used while running XportDNN? To see if all the required arguments were passed or not. Are there any particular preprocessing steps for the dataset used?

Also using quantization parameters recommended for model zoo vgg-16 would not be optimal for your custom network and so we recommend you to use parameters suggested by XportDNN.

bennihoffmann commented 6 years ago

Hi @anilmartha,

Thank you very much for your response. The command we used was

python XportDNN.pyc --quant_type "Xilinx" \ --deploy_model vgg16fc2xilinx.prototxt \ --weights vgg16fc2xilinx.caffemodel \ --quantized_deploy_model vgg16fc2xilinxQuand8.prototxt \ --calibration_directory /home/img \ --calibration_size 114 --bitwidths 8,8,8 --dims 3,224,224 --transpose 2,0,1 \ --channel_swap 2,1,0 --raw_scale 255.0 --mean_value 128,128,128 --input_scale 1.0

We tested two kind of calibration images.

First, we used the original test data of your model. But we thought this could may be run into some overfitting hence our training data is more or less background subtracted (since all images have homogeneous white background).
Second we tried to use the 12 ImageNet example pictures provided somewhere in CHaiDNN/tools/...

Like you said, the quantized_deploy_model looks proper and it is pretty close to the one from the ModelZoo. But the calculated output is NaN for every picture we tested. If we used the precision_parameter from the ModelZoo Example we receive an output, but its accuracy is very bad compare to the original caffe model we trained and run in python/cafffe.

Actually, we are running out of ideas how to solve this issue. These are some of our consideration so fare:

One Question we came up with, was how does Chai deal with the color cannels? The model we trained expects RGB data while chai uses OpenCV imread which probably produces BGR data. For this reason, we tested the quantized_deploy_model with test images where we mixed the color channels (BGR, BRG, GBR). But the accuracy turns from bad to worse.
In the source we stumble over ENABLE_ERROR_CHECKS and LAYERWISE_OUTPUT_WRITE macros. We think that this enables layer wise output comparison for a test images read in by *.txt files. If someone could introduce us how to generate this output files in python/caffe we will go on to evaluate this issue.

Many thanks, Ben

Xilinx / CHaiDNN

Network output is NaN. Using XportDNN tool for quantization produced NaN scores. #96