doonny / PipeCNN

An OpenCL-based FPGA Accelerator for Convolutional Neural Networks
Apache License 2.0
1.26k stars 369 forks source link

Pipe Depth #62

Closed zhao-lun closed 6 years ago

zhao-lun commented 6 years ago

Hi prof @doonny , what happens when i increase/decrease pipe Depth?

zhao-lun commented 6 years ago

Hi @aazz44ss do u have idea about what happens when the pipe depth is decrease/increase?

aazz44ss commented 6 years ago

pipe depth is a buffer for accumulation. it takes 4 clock to perform 1 multiply accumulate, ii = 4. if you set pipe depth to 4, that mean you buffering 4 MAC result in 4 clock. It will make ii = 1, and is the best optimization you can get Deeper depth, in my opinion, will only waste resource usage.

zhao-lun commented 6 years ago

Thanks @aazz44ss , sorry but i dont understand why the design takes 4 clock to perform one MAC? Does it varries when i add more lane num? The thing is, i see there is a large increase of logic resource when i put depth 7. Even more strange is that i try reduce the depth to 4 which save up enough logic for me to go lane 8. However the result is the same.

aazz44ss commented 6 years ago

it depends on the design of DSP, it wont vary when you increase lane number. when you set depth to 7, it cost ((7-4) x Lane) more resource usage than depth of 4. the result shouldn't change when increasing number of lane, however the performance will vary, owing to parallelism increase.

zhao-lun commented 6 years ago

Yeah, @aazz44ss by result i mean lane8 depth4 and lane 4 depth 6 have the same performance. Weird

aazz44ss commented 6 years ago

you also have to check Fmax you got

zhao-lun commented 6 years ago

I got 135Mhz for lane 4 depth 6 and 124Mhz for lane 8 depth 4.

aazz44ss commented 6 years ago

maybe you should check ii in report.html