Previously, the test_fifosizing_linear testcase only covered the tfc-w2a2 topology. This PR extends it to include the cnv-w2a2 topology from FINN examples. A lower FPS target and smaller batchsize for throughput testing is used to make it run more quickly, since this network is significantly larger than tfc-w2a2.
To enable testing the stable-state throughput after FIFO sizing using a small batch size of 2, the step_measure_rtlsim_performance is enhanced to produce a new metric called stable_throughput[images/s] in the rtlsim performance report. When using a batch size of 2 for throughput measurement, if there is a lot of folding in the network the total number of cycles will be significantly affected by the pipeline latency. This metric subtracts the number of cycles spent on performing the first inference from the total number of cycles, thus excluding the pipeline latency and giving a more accurate estimate for the stable-state throughput.
This PR depends on/incorporates #749
Previously, the
test_fifosizing_linear
testcase only covered thetfc-w2a2
topology. This PR extends it to include thecnv-w2a2
topology from FINN examples. A lower FPS target and smaller batchsize for throughput testing is used to make it run more quickly, since this network is significantly larger thantfc-w2a2
.To enable testing the stable-state throughput after FIFO sizing using a small batch size of 2, the
step_measure_rtlsim_performance
is enhanced to produce a new metric calledstable_throughput[images/s]
in the rtlsim performance report. When using a batch size of 2 for throughput measurement, if there is a lot of folding in the network the total number of cycles will be significantly affected by the pipeline latency. This metric subtracts the number of cycles spent on performing the first inference from the total number of cycles, thus excluding the pipeline latency and giving a more accurate estimate for the stable-state throughput.