blackfeather-wang / AdaFocus

Reducing spatial redundancy in video recognition. SOTA computational efficiency.
123 stars 16 forks source link

Hi,the segment_indices_glancer is different from segment_indices_focuser in Something. what is the purpose? #8

Open xusong-20 opened 3 years ago

xusong-20 commented 3 years ago

And the number of num_segments_glancer is different from the number of num_segments_focuser. 但是论文Figure 2. Overview of AdaFocus. 中 输入到fG网络 和 fL网络的帧是一一对应的。请问,输入到fG网络 和 fL网络的帧的需要是一一对应的吗?

jianghaojun commented 3 years ago

On Something-something dataset, we found that fG and fL do not have to correspond strictly. In Table 3, TSM+ uses 8 frames for both fL network. In contrast, we reduce the input size of the relatively expensive fL network to 144x144/160x160/176x176 in AdaFocus-TSM, which enables fL network to process more frames in the task-relevant region of each video using the same computation, leading to a significantly improved efficiency.