Question about the SGBM implementation

zjru commented 5 years ago

Hi,

I have some questions about the implementation of the SGBM algorithm in Xilinx/xfopencv/include/imgproc/xf_sgbm.hpp.

The main steps of SGBM are as follows: (1) Cost computation for each pixel for initialization; (2) For each pixel, the aggregated costs are computed along different directions; (3) Add the aggregated costs in all directions for each pixel; (4) Choose the disparity that minimize the cost.

The software implementation of SGBM in Xilinx/xfopencv/examples/sgbm/xf_sgbm_tb.cpp follows these steps mentioned above. But the hardware implementation provided in the library ("xf_sgbm.hpp") seems different.

After the first step to compute the cost for each pixel, the array to store the aggregated cost "Lr" in the second step should be initialized. Then the aggregation can be done along different directions. Is there any initialization in the accelerator implementation in xf_sgbm.hpp? I can only see the declaration of Lr.
Could you please tell me which directions are considered to aggregate the cost in the accelerator implementation? The number of directions in the software implementation is 5, but it seems the number of directions in the accelerator implementation is 4.
In the SGBM accelerator implementation, it seems that for each pixel, the aggregated cost along different directions can be computed in parallel and the result is sent to the next function after computation. I don't understand why the aggregation along different directions for each pixel could be computed simultaneously. From my understanding, the lengths of the aggregation paths in different directions to one pixel can be different. Also, given a streaming flow, one pixel can only see the costs of the previous pixels in the raster scanning manner and cannot see the right and down pixels. Then the computation in the right-to-left and down-to-up directions cannot be done. Is it right?

Thanks a lot!

akashsun commented 5 years ago

Hi Zjru,

In the hardware implementation, the process happens in a raster scan manner. Meaning the Lr computation happens in a sequential manner. For the computation of Lr for the second row, the Lr data for r1, r2 and r3 of the first row's pixels are necessary. We use the Lr[] array as BRAM to store complete row's Lr data. Where this array will be over written for the consecutive rows. I assume your question is for the first row's computation, as it the border case, it must be initialized. But, this array is too huge that individual initialization would take significant clock cycles. The border case here is handled in the Lr computation function, if you can notice an 'if' condition for (r==0 && c==0).
Then directions considered in hw implementation are 45,90,135 and 180. Adding the fifth direction, 0*, means we would need to make a second pass. Where gain in accuracy gain would be overcompensated by loss of performance.
Yes, you are right, adding the fifth direction means, you need to store a complete row, all 8 directions would mean you need to store the complete image's data. In the hardware point of view, it not straight forward.

Regards, Akash

zjru commented 5 years ago

Hi Akash @akashsun ,

Thank you for your reply! I really appreciate it.

One more question: Does the fifth direction 0* mean that the aggregation path is from right to left?

akashsun commented 5 years ago

Hi Zjru,

Yes, you are right about that.

Regards, Akash

zjru commented 5 years ago

Thanks! @akashsun

Xilinx / xfopencv

Question about the SGBM implementation #28