Scalsol / mega.pytorch

Memory Enhanced Global-Local Aggregation for Video Object Detection, CVPR2020
Other
565 stars 115 forks source link

A question on your implementation for FGFA paper #30

Closed jcliu0428 closed 4 years ago

jcliu0428 commented 4 years ago

Hi, Thank you for your excellent codebase! I have a small question on your implementation for the feature aggregation code for FGFA paper. In the original paper, the feature of the current frame is also accumulated for feature aggregation across nearby frames. But in this line: https://github.com/Scalsol/mega.pytorch/blob/e9d7d4fa434c84bec98e3171e783dd0c720c3fb4/mega_core/modeling/detector/generalized_rcnn_fgfa.py#L131 It seems that only the weights of nearby frames are computed. Is that correct? I have not read the whole framework. Could you take a look at it and tell me the answer?

Thanks

Scalsol commented 4 years ago

Hi, thank you for your interest! this implementation is the same as it is in the original FGFA repo. And in the paper it also says "By default, we sample 2 frames in training and aggregate over 21 frames in inference.", so in training, the current frame is not accumulated. But it could be sampled by random sampling.

jcliu0428 commented 4 years ago

Hi, Thank you for your answer! I also notice this in the original paper. By the way, I have another question. I notice both your reimplementation and official MXNet code multiply the flownet output by 2.5. But in original flownet code, I have not seen this line. Could you tell me why the output flow need to multiply by 2.5?

Scalsol commented 4 years ago

I just follow the implementation of the original repo so I also don't know why :) Maybe you should ask the author of FGFA paper or you could try to remove the 2.5 factor and see whether the performance will drop.