dicecco1 / fpga_caffe

Other
119 stars 51 forks source link

The images predicted by 8pe and 16pe ware wrong #9

Closed YaoYuxin closed 6 years ago

YaoYuxin commented 6 years ago

I want to use the kernel crp_layer_hwcn_cpfp_16pegrp to predict the output. I have already changed the deploy.txt. But the output are the same for 256 images. Then what's wrong? The output: ILSVRC2012_val_00000001,847,76,103,50,51,65 ILSVRC2012_val_00000002,847,76,103,50,51,970 ILSVRC2012_val_00000003,847,76,103,50,51,230 ILSVRC2012_val_00000004,847,76,103,50,51,809 ILSVRC2012_val_00000005,847,76,103,50,51,516 ILSVRC2012_val_00000006,847,76,103,50,51,57 ILSVRC2012_val_00000007,847,76,103,50,51,334 ILSVRC2012_val_00000008,847,76,103,50,51,415 ILSVRC2012_val_00000009,847,76,103,50,51,674 ILSVRC2012_val_00000010,847,76,103,50,51,332 ILSVRC2012_val_00000011,847,76,103,50,51,109 ILSVRC2012_val_00000012,847,76,103,50,51,286 ILSVRC2012_val_00000013,847,76,103,50,51,370 ILSVRC2012_val_00000014,847,76,103,50,51,757 ILSVRC2012_val_00000015,847,76,103,50,51,595 ILSVRC2012_val_00000016,847,76,103,50,51,147 ILSVRC2012_val_00000017,847,76,103,50,51,108 ILSVRC2012_val_00000018,847,76,103,50,51,23 ILSVRC2012_val_00000019,847,76,103,50,51,478 ILSVRC2012_val_00000020,847,76,103,50,51,517 ILSVRC2012_val_00000021,847,76,103,50,51,334 ILSVRC2012_val_00000022,847,76,103,50,51,173 ILSVRC2012_val_00000023,847,76,103,50,51,948 The last column is the label data.

dicecco1 commented 6 years ago

Which model are you using and what batch size?

dicecco1 commented 6 years ago

You may need to change the padding in the first pad layer. E.g. go from 4->8 for the 8 PE group, and 4->16 for the 16 PE group

YaoYuxin commented 6 years ago

I am using alexnet and the batchsize is 256

dicecco1 commented 6 years ago

Yeah try setting the pad layer to pad to 8 for 8 PE group and 16 for 16 PE group. This pads the depth of the first convolution to be a multiple of the # of PE groups.

dicecco1 commented 6 years ago

You also need to set the variable num_pe to 8 or 16 in the prototxt

YaoYuxin commented 6 years ago

Thank you for your help! Yes, I have already set the num_pe. Just tried pad=8 and pad16. The accuracy is correct now. But I dont find the inference time is reduced when I used 16PE. If I want to reduce the inference time, what else parameters could I try?

dicecco1 commented 6 years ago

It should improve it a little bit, but you're better off changing the #define OCFACT 1 to #define OCFACT 16 in crp_layer_hwcn_cpfp.cpp (or change it to 8 for 8pe or 4 for 16pe). The number of PEs is OCFACT * (# of PEs/group), so 4pe which is the default with OCFACT set to 16 gives 64 PEs.

YaoYuxin commented 6 years ago

Thank you so much! I will try it today. Last question: I am using Xilinx KCU1500, SDAccel is 2017.2. You mentioned "Later versions of SDAccel should work too, though low precision multipliers don't seem to map well to DSPs in 2017.1. To overcome this use the 3 input multiplier implementation of the crp layer." Could you kindly tell me how to use 3 input multiplier?

dicecco1 commented 6 years ago

It might even be fixed in that version, I would try the regular kernel first because I haven't tested the other one too much. The other kernel using the other multiplier is in the directory, just make sure OCFACT is set to be a multiple of 2.