Whether concatenate features (RGB and flow) are necessarily required?

dingfengshi / TriDet

[CVPR2023] Code for the paper, TriDet: Temporal Action Detection with Relative Boundary Modeling

MIT License

160 stars 13 forks source link

Whether concatenate features (RGB and flow) are necessarily required? #18

Closed Ouya-Bytes closed 1 year ago

Ouya-Bytes commented 1 year ago

I receive mAP 68.8 when I train Thumos datasets using the default configurations file, but when I alter the input feature dims to [:,:1024] or [:,1024:] in the dataset loader, which means just using RGB features or flow features as model input, I get mAP 58.96 and mAP 63.69, respectively. How necessary are concatenate features (RGB and flow), and why?
I look forward to hearing from you. thanks

dingfengshi commented 1 year ago

Hi, for the THUMOS14, the model with pure optical flow outperforms with pure RGB, and it is expected that combining data from both modes yields superior performance. Concatenate is just a simple fusion method, there are also some works that use different ways to fuse, such as distillation [1] or fusing model outputs [2].

[1] Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection. [2] AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization.

Ouya-Bytes commented 1 year ago

Thank you for your reply！