Closed Ouya-Bytes closed 1 year ago
Hi, for the THUMOS14, the model with pure optical flow outperforms with pure RGB, and it is expected that combining data from both modes yields superior performance. Concatenate is just a simple fusion method, there are also some works that use different ways to fuse, such as distillation [1] or fusing model outputs [2].
[1] Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection. [2] AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization.
Thank you for your reply!
I receive mAP 68.8 when I train Thumos datasets using the default configurations file, but when I alter the input feature dims to [:,:1024] or [:,1024:] in the dataset loader, which means just using RGB features or flow features as model input, I get mAP 58.96 and mAP 63.69, respectively. How necessary are concatenate features (RGB and flow), and why?
I look forward to hearing from you. thanks