bkarl / conv2d-vhdl

2D convolution implemented in VHDL
MIT License
2 stars 0 forks source link

4x4 Kernel Size #1

Open HeatPhoenix opened 2 years ago

HeatPhoenix commented 2 years ago

Hi bkarl,

I've been working with your conv2d vhdl engine and was looking to implement a 4x4 kernel (and stride of 2). Strides is easier as one can just toss the results that aren't needed and get the same result (though wasteful). The question of 4x4 kernel is a bit harder. How do I make sure the pipeline is filled properly when calculating the "y" functions?

Any other help with building a 4x4 implementation of this module is appreciated.

bkarl commented 2 years ago

Hi HeatPhoenix. If I remember right from the time I developed this code achieving a stride of 2 was by far more challenging than getting a 4 x 4 kernel because pipelining is a pain.

To get the 4 x 4 convolution we need to add a new set of y - variables : y03, y13, y23, and y30, y31, y32, y33 to store the intermediate results of the kernel. The number of registers to store coefficients needs to be adjusted to 16. Also the pixbuf module needs another line buffer to store intermediate results of 3 lines (needed to do the 4x4 convolution). So we will have an additional linebuffer there.

At last the calculation logic needs to be adjusted so intermediate results are stored after the fourth calculation (in contrast to the third calculation right now) and another set of parallel computations to calculate y3x need to be added. I think all of that could be done by carefully extending the existing code and simulating in parallel but stride 2 needs a lot more work especially if data is not entering the module continuously.

Greetings

HeatPhoenix commented 2 years ago

Regarding 4 x 4 kernel: Thank you for your insight, I'll try to implement it and report back.

As for Stride=2, the (wasteful) way I did this was by simply tossing the unnecessary results as calculations proceed, to put it in a Matlab way:

c = conv2(b, a, 'same');
d = c(1:2:end, 1:2:end)

I then compared the result of the .bin (from VHDL) to my Matlab result and it matched. This is a wasteful way of doing things, but anything else would take more time than my current timescale allows.

(For context, I am using your Conv2d module -- with credit, of course -- in building a Spiking Neural Network accelerator for FPGAs for my master's thesis)

HeatPhoenix commented 2 years ago

Dear @bkarl,

There's one thing I'm unsure about. In the pixbuf module, how do we handle adding the extra linebuffer: I understand the need for the extra signals lb2, data_from_linebuffer2 etc., however, I'm not sure I understand how to modify the logic there.

For

if (unsigned(pix_y_in) mod 2 = 1) then
    output_map <= '0';
else
    output_map <= '1';
end if;

if (output_map = '0') then
    y2_out <= data_from_linebuffer1;
    y1_out <= data_from_linebuffer0;
else
    y2_out <= data_from_linebuffer0;
    y1_out <= data_from_linebuffer1;
end if;

Would output_map become a 2-bit signal? How does this output mapping work, especially when expanding to 3 linebuffers. Similarly for the rest of the process and how the input/output would change. Any insight is highly appreciated.

Many thanks, Zack