Add cnv-w2a2 to FIFO sizing test

This PR depends on/incorporates #749

Previously, the test_fifosizing_linear testcase only covered the tfc-w2a2 topology. This PR extends it to include the cnv-w2a2 topology from FINN examples. A lower FPS target and smaller batchsize for throughput testing is used to make it run more quickly, since this network is significantly larger than tfc-w2a2.

To enable testing the stable-state throughput after FIFO sizing using a small batch size of 2, the step_measure_rtlsim_performance is enhanced to produce a new metric called stable_throughput[images/s] in the rtlsim performance report. When using a batch size of 2 for throughput measurement, if there is a lot of folding in the network the total number of cycles will be significantly affected by the pipeline latency. This metric subtracts the number of cycles spent on performing the first inference from the total number of cycles, thus excluding the pipeline latency and giving a more accurate estimate for the stable-state throughput.

Xilinx / finn

Add cnv-w2a2 to FIFO sizing test #750