Xilinx / BNN-PYNQ

Quantized Neural Networks (QNNs) on PYNQ
https://xilinx.github.io/finn/
BSD 3-Clause "New" or "Revised" License
669 stars 307 forks source link

Question regarding modification of number of output classes #33

Closed vonhachtaugust closed 5 years ago

vonhachtaugust commented 6 years ago

Hi,

I am trying out the HLS source code together with the SW host code to run a binary neural network in a Zynq via SDSoC. So far everything has worked out well with MNIST and CIFAR10 (running both in SW and HW without a python overlay), however now I would like to try using a dataset with more than 10 classes.

In short this means that I have to align the lasange neural network output with the FPGA implementation output. But I cannot find the connection in the FINN synthesis which makes this alignment.

Could you in short explain how I should modify CNV-PYNQ network to get e.g. 20 16 bit slices instead of 10 16 slices as in the current state. How does these 10 16 bit values couple with the 10 float values that you get from the lasange network output?

Finally, in the CNV-PYNQ example you read out 16 64 bit values each containing 4 16 bit output results of which 10 are used, thus resulting in alot of data being unused. But it seems likes there are still weights connected to the output that is unused, or am I wrong about this? Does the FINN synthesiser produce a weight padding somewhere?

giuliogamba commented 6 years ago

Hi,

the overlay used for Cifar-10 is the same used for other datasets (like German Road-Signs and SVHN), namely the CNV topology. So the hardware has embedded support to multiple number of output classes, with a maximum of 64 (as for Matrix Height in the hardware config file here) The number of classes of your dataset is a runtime parameter in the inference call (number_class). That value is automatically inferred in the python class when reading the classes.txt file included in each dataset parameter folder. As an example, the road-signs dataset has 44 classes and it's supported by the current overlay

Hope this helps

vonhachtaugust commented 6 years ago

Hi,

Yes I realised that the hardware supports 64 16 bit output width and thus 64 classes at maximum is support, great!

Now I would also like to work with grayscale images but there is quite a stream pipe conversion mechanism at the beginning of the conv pipeline to support rgb input. This stream conversion 64 -> 192 -> 24 I find difficult to understand and thus to modify for grayscale (64 -> 8?).

How would you modifiy the beginning of the cnv-pynq pipeline so that it operates on grayscale instead of rgb?

Thank you for helping out!

giuliogamba commented 6 years ago

Hi,

the stream conversion is done to adapt the AXI width (64 bits) to the input of the first layer (3 channels, 8 bit each -> 24 bits). The datawidth-converters supports only up-scaling (down-scaling) to multiples (sub-multiples) of the input dimension. Thus to convert 64 to 24, we have to upscale to 192 before down-scaling.
If you have gray-scale images (1 channel) , you are correct in just applying the downscale from 64 to 8 bits. You can just remove one of the DataWidth converters (line 168) and change line 167 as follows: StreamingDataWidthConverter_Batch<64, 8, (32*32*8) / 64>(inter0, inter0_1, numReps);

Hope this helps

vonhachtaugust commented 6 years ago

Hello,

I am not quite getting it right it seems. What should be done about the fix point representation in thresh0? I set it to ap_fixed<8,4> since I am using 8 bit?

Also in finnthesizer.py how should I set numThresBits and numThresIntBits? I have set them to 8 and 4 respectivly.

Been trying to make the conv-net work on a grayscale version of cifar10, use finnthesizer to generate the weights and then give grayscale input randomly selected from cifar10 (as grayscale) but the result is not right. Getting about 85% accuracy in SW but only 25% in HW.

Do you have any other tips why this might be the case?

giuliogamba commented 6 years ago

The thresholds size depends on the number of accumulation you perform. If you accumulate 3 times an 8*1 bit multiplications you are going to need at least 10 bits. In your case, single channel (thus no accumulation, only multiplication) the thresholds should be 8 bits, but with the same precision as the input, so ap_fixed<8,1>. What do you mean by SW accuracy? Do you mean in Theano or in Csim using HLS?

vonhachtaugust commented 6 years ago

Okey, I will try with ap_fixed<8,1>. How do I tell this to the finnthesizer when it prepares the thresholds format?

In SW accuracy I mean when testing in Theano using lasagne. HW accuracy is testing using a generated bitstream based on the BNN cnv HLS code.