cornell-zhang / bnn-fpga

Binarized Convolutional Neural Networks on Software-Programmable FPGAs
BSD 3-Clause "New" or "Revised" License
301 stars 112 forks source link

Is it possible to run on microzed? #10

Closed manymuch closed 6 years ago

manymuch commented 6 years ago

I tried to compile by specifying platform to microzed. But several errors came out like:

This design requires 28666 of such cell types but only 17600 compatible sites are available in the target device.

This design requires more RAMB18 and RAMB36/FIFO cells than are available in the target device. This design requires 154 of such cell types but only 120 compatible sites are available in the target device.

How can I modify this project to work on microzed. Could you give any instructions? Thanks!

rzhao01 commented 6 years ago

It looks like the default design is too large to run on the microzed. I recommend setting CONVOLVERS to 1 in Accel.h and using the optimized branch. See if that makes a difference.

manymuch commented 6 years ago

Thanks for your reply. I've tried the instructions, it did improve a little bit, but still couldn't meet the requirement of microzed. Here is the error messages:

ERROR: [Place 30-640] Place Check : This design requires more Slice LUTs cells than are available in the target device. This design requires 23490 of such cell types but only 17600 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning. INFO: [SDSoC 0-0] See /mnt/c/Users/zjx21/Downloads/bnn-fpga-optimized/cpp/accel/sdsoc_build/_sds/p0/ipi/vivado.log for the context of the Vivado message above. ERROR: [Place 30-640] Place Check : This design requires more LUT as Logic cells than are available in the target device. This design requires 22947 of such cell types but only 17600 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device. Please set tcl parameter "drc.disableLUTOverUtilError" to 1 to change this error to warning. INFO: [SDSoC 0-0] See /mnt/c/Users/zjx21/Downloads/bnn-fpga-optimized/cpp/accel/sdsoc_build/_sds/p0/ipi/vivado.log for the context of the Vivado message above. ERROR: [Place 30-640] Place Check : This design requires more RAMB18 and RAMB36/FIFO cells than are available in the target device. This design requires 147 of such cell types but only 120 compatible sites are available in the target device. Please analyze your synthesis results and constraints to ensure the design is mapped to Xilinx primitives as expected. If so, please consider targeting a larger device.

Any futher ideas? Thank you!

rzhao01 commented 6 years ago

You can try relaxing the clock period. The clock period depends on the clkid and dmclkid arguments on this line: https://github.com/cornell-zhang/bnn-fpga/blob/master/cpp/accel/sdsoc_build/Makefile#L6, the microzed docs should contain info on what clock frequencies are available.

manymuch commented 6 years ago

Thanks for your reply! I changed clkid and dmclkid from 1(142MHZ) to 2 (100MHZ), it seems that setting a lower clock frequency has little influence on hardware requirements. I also tried different clock settings, the LUTs requirements were still around 23000 and BRAMs requirements were still 147.

rzhao01 commented 6 years ago

Oh I didn't notice the BRAM. BRAM won't be affected by anything we discussed, but you can try reducing WT_L by factor of 2 or 4 and it should help. Make sure to run software tests to check that lower WT_L still works.

I think you can also reduce PIX_PER_PHASE to just 32*32.

hex0102 commented 6 years ago

@manymuch hi, have you be able to run the design on microzed board? Thanks

manymuch commented 6 years ago

@hex0102 not yet. I have tried the methods mentioned by rzhao, but still couldn't meet the requirement of microzed. I am now running this code on zedboard and trying some different models.

As far as I know, reducing WT_L would use fewer BRAMs, other methods like setting clock periods or reducing PIX_PER_PHASE wouldn't reduce the usage of LUTs.

If you have any progress please let me know.

rzhao01 commented 6 years ago

Another method would be to remove the first layer or the dense layer from the accelerator and simply run them on the CPU, leaving just the binary conv layers for the FPGA. If there are no alternatives you might have to make do with this.

rzhao01 commented 6 years ago

Closing due to lack of activity.