VITA-Group / FasterSeg

[ICLR 2020] "FasterSeg: Searching for Faster Real-time Semantic Segmentation" by Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang
MIT License
525 stars 107 forks source link

How do you handle fluctuations in latency measurement? #21

Closed maaft closed 4 years ago

maaft commented 4 years ago

I noticed that the measured latencies differ substantially from run to run. For example I get following results at different program runs:

  1. BasicResidual_downup_2x_H24_W24_Cin80_Cout80_stride1: 0.0549 ms BasicResidual_downup_2x_H12_W12_Cin80_Cout80_stride1: 0.0511 ms
  2. BasicResidual_downup_2x_H24_W24_Cin80_Cout80_stride1: 0.0517 ms BasicResidual_downup_2x_H12_W12_Cin80_Cout80_stride1: 0.0502 ms
  3. BasicResidual_downup2xH24_W24_Cin80_Cout80_stride1: 0.0505 ms BasicResidual_downup2xH12_W12_Cin80_Cout80_stride1: 0.0513 ms

Note that in the last run the operation on quarter resolution (12x12) took longer than the operation on the base resolution (24x24).

This will substantially influence the network-architecture search as the optimizer will prefer pathes on higher resolutions.

How did you make sure that your measured latencies "make sense"? Did you set static GPU clocks or similar to get stable results?

chenwydj commented 4 years ago

Hi @maaft !

Feature maps of 12x12 or 24x24 are very small. In Cityscapes, a feature map of 1/32 scale have a size of 32x64. Fluctuations in these feature maps have a small impact on the overall latency of the whole network (1000/163.9 = 6.1ms in our network v.s. ~0.05ms in your case).

As demonstrated in our Figure 3, our latency estimation achieves a very high correlation with real latency. This correlation is verified by sampling 1,000 subnetworks.

maaft commented 4 years ago

Okay, this of course makes latency-based NAS very hard for applications with smaller images.

Until I find a better solution, I will just multiply the height and width during latency measurement with a constant factor (8 in my case) which will result in stable results but will not reflect the real-latency. But at least the relative latencies between different operations are at least correctly estimated.