doonny / PipeCNN

An OpenCL-based FPGA Accelerator for Convolutional Neural Networks
Apache License 2.0
1.22k stars 370 forks source link

Why delayed buffer can reduce the initiation interval to 1 cycle? #58

Closed laski007 closed 6 years ago

laski007 commented 6 years ago

Dear Prof. Wang, I read your paper and code, you used delayed buffer accu_piped[6] to store lane_accum for 6 cycles. You mentioned it improved the utilization of the multipliers. But I'm still don't understand why the delayed buffer can reduce the initiation interval (ii) to 1 cycle? Thank you so much!

laski007 commented 6 years ago

Dear @aazz44ss Do you know it?

aazz44ss commented 6 years ago

it needs 4 clock to read global memory and do MAC

for more details, please refer to altera best practice guide 5.1.1 Removing Loop-Carried Dependency https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf

laski007 commented 6 years ago

Dear 浩一,thank you so much for your help @aazz44ss . Best regards!