Xilinx / finn

Dataflow compiler for QNN inference on FPGAs
https://xilinx.github.io/finn
BSD 3-Clause "New" or "Revised" License
708 stars 225 forks source link

Im2Col_0: Dilation value != 1 is not supported for square convolutions #860

Open sansi-zhang opened 1 year ago

sansi-zhang commented 1 year ago

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

Quick summary

Sorry to bother you. Now I am using a custom network architecture to use finn, and I have encountered the problem that dilation must be '1'.

Details

I ran into problems running the InferConvInpGen() function while using finn:

image

That is, the InferConvInpGen() function doesn't support dilation greater than 1, but now my code needs network support for dilation greater than 1.

Then, I tried to comment out the assert field in the function, and although I successfully executed the function, I got another error when executing the following function.

model = Model.transform (ZynqBuild(platform = test_pynq_board, period_ns = target_clk_ns))

The following error occurs:

image

I have checked the previous issue and judged that this should be caused by the wrong generation of hls before, so I would like to ask you if there is any way to deal with the situation where dilation is greater than 1.

fpjentzsch commented 1 year ago

Hi, can you try to use the RTL implementation of the ConvolutionInputGenerator by calling the transformation with InferConvInpGen(use_rtl_variant=True)? It should support dilation > 1.

sansi-zhang commented 1 year ago

Ok, thank you very much. I'll try again tomorrow and ask you if I have any questions.

---- Replied Message ---- | From | Felix @.> | | Date | 07/20/2023 22:50 | | To | @.> | | Cc | Special-zhang @.>, Author @.> | | Subject | Re: [Xilinx/finn] Im2Col_0: Dilation value != 1 is not supported for square convolutions (Issue #860) |

Hi, can you try to use the RTL implementation of the ConvolutionInputGenerator by calling the transformation with InferConvInpGen(use_rtl_variant=True)? It should support dilation > 1.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

sansi-zhang commented 1 year ago

I'm sorry to bother you again. I used the solution you gave to solve the dilation problem, but there was an error when I set PE and SIMD value and used ZynqBuild function.

First

Since I wasn't sure how to properly configure my PE and SIMD, I used the automatic conversion functions SetFolding() and InsertAndSetFIFODepths("ZCU104")(I used ZCU104). But I got an error using InsertAndSetFIFODepths:

cd2319dc2f5b884396e77cf81dbc700

Seems I'm missing some ip info.

Second

I tried using the default fifo, Skip InsertAndSetFIFODepths and run model = model.transform(ZynqBuild(platform = test_pynq_board, period_ns = target_clk_ns)), but again we have a problem:

350e266be56ddcf65ab2064798f1759

To verify if it was my network problem, I tried to run the CNV program from finn's library, but when I ran ZynqBuild, I got a similar error:

60d8f81d5335c5b120fd22d47f753f0

I continued to verify on TFC, but TFC's ZynqBuild function worked fine. So now I am very confused and I hope you can help me solve it.

Finally

I want to try to configure appropriate PE and SIMD by myself. I hope you can give me some suggestions and reference configuration methods.

fpjentzsch commented 1 year ago

To get to the bottom of these failures you will need to dig deeper into the Vivado/Vitis HLS logs. See the path given by the message ("Check logs under ...") and individual run directories that might be located within them.

To get an idea about PE & SIMD, you can check out our tutorial notebook on folding: https://github.com/Xilinx/finn/blob/dev/notebooks/advanced/3_folding.ipynb

sansi-zhang commented 1 year ago

Hi, I'm terribly sorry to bother you again.

Question

With your generous help, I successfully completed the HLS generation of the custom network model. I also successfully ran the InsertAndSetFIFODepths('xczu7ev-ffvc1156-2-e') function, but its execution time was too long (more than 4 hours were not finished). So I probably won't consider this approach to building fifo depth again.

I tried to check out more information in finn's notebooks and readthedocs, and looked through issues, bugs, and code related to deep fifo design, but I didn't understand how to choose the depth of the fifo after each MVAU.

Therefore, I hope you can tell me the relevant methods and design ideas.

Extend

In the future, I want to try to do more complex network quantification operations. The total number of convolutional layers of the network should be under 50. The middle part of the network should contain multiple branches (similar to the image below), no other convolutional and fully connected layers, and mainly quantized in 2 and 4 bits.

And I choose to place the special data processing operation of preprocessing and the softmax operation of post-processing on ARM, that is, to run a series of convolution operations (2D convolution, ReLU, BN and stack) only on the FPGA chip.

Now I want to ask you to verify the feasibility of my branch design idea and whether the network architecture can be deployed in a Streamline way on ZCU104. Moreover, whether the PE and SMID Settings should be small for such a large network? Finally, how do I set the fifo depth in this network.

image

Thanks again

sansi-zhang commented 1 year ago

Sorry to bother you again.

Below, I have encountered the problems I encountered in carrying out the above tasks, and I hope you can answer them for me.

First

I tried the deeper network I mentioned above today, but I had issues running the GiveReadableTensorNames() function.

image

Then I did some debugging and I noticed that it looked like I was using weight sharing in my code (I called the same block of code multiple times).

Then I tried to unshare the weight sharing code and found that the GiveReadableTensorNames() function worked.

I then tried to verify if the problem was caused by directly using the same code block, but no matter how much I changed the shared method, it didn't seem to work.

So I would like to ask you if there is any way to make the use of weight sharing can still work.

Second

I used the non-weight sharing method to continue the following work, but I ran into a problem again.

I ran CreateDataflowPartition() and got an error.

image

As far as the error message is concerned, it seems that a loop in my code is causing the problem. I checked my code and the corresponding onnx file, and there is no loop issue.

I then did some testing. I ported all the convolutional subclasses from my source code to the main class implementation and fully unrolled all the convolution loops, but I was still stuck in CreateDataflowPartition() with the same error message.

Below is a snippet of the onnx file before I did the tests.

image

Below is a snippet of the onnx file that I tested.

image

Thanks again