lishen565 commented 6 years ago

Hi, I've tried to run the squeezenet in accordance with "Ristretto Quantization User Guide", and get my quantized.prototxt. When I ran my squeezenet.elf on board something was wrong at parsing pool10 whose global_pooling is true and type is AVE. The detail result is below: _20180609074743 And I checked my quantized.prototxt and found that there is no quantization_param in layer pool10. Can you help me to solve this problem ?

VishalX commented 6 years ago

Hi @lishen565,

Try adding "quantization_param" for pool10 manually. It should be the same as the previous layer.

lishen565 commented 6 years ago

Hi, I've tried that but the camel.jpg was classified as No.477(line number of out.txt) which is "carousel, carrousel, merry-go-round, roundabout, whirligig" and probability is 0.272858. Then I tried other class such as apple, but the probability was still the same and still classified No.477. I've checked other probability and found that no matter what object was test, the result was always the same. So I think there is something wrong about my quantization. My quantization.prototxt created from 00_quantize_squeezenet.sh is as below, could you please give me some advice to solve my fault, thanks very much. quantized.txt And I found there is many negtive number in my quantization_param, such as: layer { name: "conv1" type: "ConvolutionRistretto" bottom: "data" top: "conv1" convolution_param { num_output: 96 kernel_size: 7 stride: 2 weight_filler { type: "xavier" } } quantization_param { bw_layer_in: 8 bw_layer_out: 8 bw_params: 8 fl_layer_in: 0 fl_layer_out: -3 fl_params: 7 } } Is this right?

lishen565 commented 6 years ago

And my out.txt is below: out.txt

NeutrinoXY commented 6 years ago

Hi,

If the quantization works when you benchmark your model using caffe-ristretto, then it should works on the board.

You should check the output of some layers of your network. Below an example of how you can get the output of the convolutional layer j :

short *output0 = (short *)jobQueue[0][j].out_ptrs[0]; short *output1 = (short *)jobQueue[0][j].out_ptrs[1]; int out_fbits0=jobQueue[0][a0].qf_format.op_fbits; int out_fbits1=jobQueue[0][a0].qf_format.op_fbits; for(int i=0;i<length;i++){ float val = ((float)output0[4*i])/(1 << out_fbits0); float val = ((float)output0[4*i+1])/(1 << out_fbits0); float val = ((float)output0[4*i+2])/(1 << out_fbits0); float val = ((float)output0[4*i+3])/(1 << out_fbits0); cout<<val0<<" "<<val1<<" "<<val2<<" "<<val3<<endl; } for(int i=0;i<length;i++){ float val = ((float)output1[4*i])/(1 << out_fbits1); float val = ((float)output1[4*i+1])/(1 << out_fbits1); float val = ((float)output1[4*i+2])/(1 << out_fbits1); float val = ((float)output1[4*i+3])/(1 << out_fbits1); cout<<val0<<" "<<val1<<" "<<val2<<" "<<val3<<endl; }

smenon009 commented 6 years ago

@lishen565 : Regarding the poor accuracy:

quantization params supplied may not be the best possible ones. Have you tried re-training yet for this network with the quantization params set?
Also, CHaiDNN does not support negative-fl values. Try changing "fl_layer_out: -3" --> "fl_layer_out: 0" and try re-training the model. This should improve the accuracy.

smenon009 commented 6 years ago

@NeutrinoXY: This statement "If the quantization works when you benchmark your model using caffe-ristretto, then it should works on the board." need not be true always. Mainly because, the quantization_param requirements for CHaiDNN is different from what Caffe-Ristretto adds. CHai requirements are defined here: https://github.com/Xilinx/CHaiDNN/blob/master/QUANTIZATION_PARAMETERS_UG.md

CHai requires more quantization params than what is supplied by the Caffe-Ristretto by itself. This git-issue started with the error that the pool-layer is not having the quantization param and Caffe-Ristretto does not add quantization param block for pool layer, which is required for CHai.

lishen565 commented 6 years ago

Hi, thank you for your answer. In fact I've tried many ways to make a new model work on my zcu102. I also tried to change the bw_layer_in from 8 to 16, add 8 to fl_layer_out so I can make it a positive number. Then I used the 01_finetune_squeezenete.sh to retrain the model and put the iter(like iter100, iter200) internal produced caffemodel on the board, but the result was aways wrong though I choosed some different class jpg. Now I'm trying to use the example googlenet deploy.prototxt and modified it to a train_val.prototxt, then used the 00_quantize_squeezenet.sh to produce a new quantized.prototxt. And I'll compare the new quantized.prototxt with the original workwell deploy.prototxt from googlenet example. Wish I'll find some new way to make my new model work.

lishen565 commented 6 years ago

Hi, I've modified the example googlenet deploy.prototxt to a train_val.prototxt. Through running 00_quantize_squeezenet.sh which use my train_val.prototxt ,it gives error as below:

Aborted at 1528874721 (unix time) try "date -d @1528874721" if you are using GNU date PC: @ 0x7f5d9d979774 Quantization::GetIntegerLengthParams() SIGSEGV (@0x0) received by PID 23705 (TID 0x7f5d9ed40200) from PID 0; stack trace: @ 0x7f5d9c6caf20 (unknown) @ 0x7f5d9d979774 Quantization::GetIntegerLengthParams() @ 0x7f5d9d979926 Quantization::EditNetDescriptionDynamicFixedPoint() @ 0x7f5d9d97b221 Quantization::Quantize2DynamicFixedPoint() @ 0x7f5d9d97fc08 Quantization::QuantizeNet() @ 0x562d0b99075f quantize() @ 0x562d0b98fada main @ 0x7f5d9c6adb97 __libc_start_main @ 0x562d0b99033a _start Segmentation fault (core dumped)

smenon009 commented 6 years ago

@lishen565 Regarding the error during quantize stage, are you using the caffe-default train-val prototxt( derived from deploy.prototxt) or the CHai example prototxt provided while running?

lishen565 commented 6 years ago

I used the deploy.prototxt of example googlenet, but I modified it according to the layers of start and end from caffe-master/models/bvlc_googlenet/train_val.prototxt. After that I deleted the quantization_param{} of the deploy.prototxt. Then I got a train_val.prototxt and updated the path in 00_quantize_squeezenet.sh.

NeutrinoXY commented 6 years ago

@smenon009 CHaiDNN doesn't need the quantization parameters for the pool-layers, because it uses the ones from the previous layer.

If the requirements for quantization are met, there is no reason you get an output like that. There is definitely an issue with one or more layers. The best way to debug is to check the output of each layer.

@lishen565 Please upload your "train_val.prototxt".

smenon009 commented 6 years ago

@NeutrinoXY I far as i can tell, CHai does require quantization param for pooling(Ave) and not for max pool. And pool10 mentioned by @lishen565 seems to be a Global Ave pool. Sure, debugging layer output seems plausible.

NeutrinoXY commented 6 years ago

This is good to know. I didn't use average pooling yet.

lishen565 commented 6 years ago

The squeezenet train_val.prototxt is below, and it does have a global average pooling called "pool10", github don't support .prototxt format so I modified it to a txt. train_val.txt

lishen565 commented 6 years ago

One more question, ChaiDNN said that variate inBytes in example.cpp represents the input data width, only 1 and 2 are the valid values for it(https://github.com/Xilinx/CHaiDNN/blob/master/RUN_NEW_NETWORK.md). But I don't know how to determine the width of my input jpg. Would you give me some advice? Thanks

NeutrinoXY commented 6 years ago

The input width doesn't come from your image, but from how you load the values of your pixels. If you decide to take 16 bits for your input width, for one pixel, you'll put the value of each channel (BGR format) in one variable with short type. If you take 8 bits for your input width, you'll use the unsigned char type. So, just read your "loadImagetoBuffPtr" function and you'll have your answer.

lishen565 commented 6 years ago

Hi, @NeutrinoXY , I'm trying to check the output of layers of squeezenet .There is some puzzle about the example code you give. Firstly, I can't determine the length of the for loop. I've tried another way like

if 1

        SHORT_TYPE *out0    = (SHORT_TYPE*)hwQueue[ImgId][whichConv].out_ptrs[0];
        SHORT_TYPE *out1    = (SHORT_TYPE*)hwQueue[ImgId][whichConv].out_ptrs[1];
        int    length  = ((int*)hwQueue[ImgId][whichConv].params)[0];
        string out_dir = "/mnt/models/debug/conv_" + whichConv ;                
        FILE *out = fopen(out_dir.c_str(), "w");
        for(int i = 0; i < length; ++i)
        {
            fprintf(out, "%d    %d\n", ((SHORT_TYPE*)out0)[i],((SHORT_TYPE*)out1)[i]);
        }
        fclose(out);

endif

I put this code in xiExec while(1){}. But it would stop on fly and said memory access error, which make me think the length was not right(the length I choosed according to software/scheduler/xi_scheduler.cpp,line 1001 ). So could you tell me how to get the right length? Secondly, I don't understand the effect of out_fbits0 and out_fbits1, and why we have to break up the output0 an output1 into 4 parts seperately. Thirdly , I think that after calling ConvolutionForward we would not get the real output from hwQueue[ImgId][whichConv].out_ptrs immediately. Is it right? Should I inquire about the layer output after calling sds_wait() or sds_try_wait()? Thank you very much.

NeutrinoXY commented 6 years ago

Hi,

You can arbitrarily choose the length, you don't need to print all values anyway. Your length is not right because you have more than one output channel.

You have to use out_fbits0 and out_fbits1 because you get the data using short type, an integer type which uses 16 bytes. Then you have to convert it to a float and to do a right shift of out_fbits (the bit size of the floating part). I broke up the output into 4 parts just for a better printing.

You can put this code in the xiExec loop (while(1)), or in your main function after the xiExec call. If you put it in the xiExec loop, then yes you're right, you have to print the output after the sds_wait (when convInUse becomes false).

Also, you got a memory access error because you didn't declare out0 as a pointer.

Best regards.

daichong123 commented 6 years ago

Hi, have you solved your problem? I also have a similar problem with you. I modified the "data" layer of the deploy.prototxt of the VGG16 network in CHaiDNN, changing the number of the output of the "FcRistretto" layer to 4, the modified file as my quantized.prototxt, and the file below. Using 01_finetune_squeezenet.sh to start the fine-tuning, but the final result is that the probability that the image belongs to any sort is 0.25, and I try to replace the other image and get the same results. quantized.txt

lishen565 commented 6 years ago

Sorry, I've not solved the problem yet @daichong123. But I've tried to retrain the googlenet from CHaiDNN example, the resulted quantized.prototxt is much different with the one provided from example, it is as below:

quantized_googlenet.txt

@NeutrinoXY. I've fixed my application code bug with the help of yours.But the classfication result is still wrong, such as it will classfy camel to Weimaraner and probability is 0.99+.Do you have any advice or expierence to help me to get a work quantized.prototxt? Thank you very much.

NeutrinoXY commented 6 years ago

Could you share the output of your "conv1/7x7_s2" layer ? The one using your zcu102, and the one using caffe on your computer. So you can start to isolate where the "bug" is happening.

There is no clue (yet) there is a problem with your quantized.prototxt.

lishen565 commented 6 years ago

Hi，my generated deploy.prototxt of vgg16 is as below: deploy.prototxt.txt and my generated deploy.prototxt of GoogleNet is as below: deploy.prototxt.txt And the result is below: default So, I have not get the output of "conv1/7x7_s2" layer. I think there is much work to get a work deploy.prototxt though we get a quantized.prototxt with quantization_param. But I have the output of layers of squeezenet which I can share. Do you need any output of squeezenet to help me isolate where the "bug" is happening?

NeutrinoXY commented 6 years ago

Hi,

Sure, you can share the output of the first convolutional layer of your squeezenet.

In your googlenet deploy.prototxt, just modify your quantized parameters by subtracting 16 bits for the I/O and 24 bits for the parameters. Then fine-tune your model with it.

lishen565 commented 6 years ago

Hi, my output of the first convolutional layer of squeezenet is as below: conv1_out.txt and the input is camel.jpg. Thank you for your help.

NeutrinoXY commented 6 years ago

Hi,

Can you share your squeezenet.prototxt and caffemodel aswell ?

lishen565 commented 6 years ago

Hi, all outputs are in the zip below: query.zip Thanks a lot.

NeutrinoXY commented 6 years ago

Hello,

I don't get the correct output when I try to classify your camel.jpg with your network model using caffe. I probably made a mistake. Did you succeed to classify it ?

lishen565 commented 6 years ago

I still can't classify it correctly using my quantized.prototxt neither. I think there is something going wrong during pool10out -> softmax. The pool10out.txt shows that the class 189 is the highest and 355(whichi is camel) is second highest. But after softmax, it turns out that 179 is the highest. I think the quantization of pool10out is not appropriate. Do you have any good way or experience to get a usable quantized.prototxt? @VishalX . Thanks.

smenon009 commented 6 years ago

hi @lishen565 Checkout the CHai v2 and the XportDNN tool. https://github.com/Xilinx/CHaiDNN/blob/master/docs/QUANTIZATION.md Setup: https://github.com/Xilinx/CHaiDNN/blob/master/tools/SETUP_TOOLS.md

Let us know how that goes.

ryankang95 commented 6 years ago

@NeutrinoXY Can you explain how you get all the output from the convolution layer？For some reason, I want to print all values thanks

VishalX commented 6 years ago

Inactive.Closing.

gitosu67 commented 3 years ago

@lishen565 In the output file you provided, how do you determine which output data is for which image? Since ChaiDNN takes in two images as inputs. Looking at the total length of the output file, it seems there are outputs there for both images. But how to differentiate which output index is for which image? @NeutrinoXY @VishalX

Xilinx / CHaiDNN

Problem about Ristretto? #25

if 1

endif