Roc-Ng / XDVioDet

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020
101 stars 17 forks source link

Why concat rgb、flow and audio worse than rgb and audio? Have you ever encountered the same problem? #14

Open Xpamile opened 2 years ago

Xpamile commented 2 years ago

When I fuse rgb and audio ,the Ap of your paper is 78.64%. But if I use three multimodal, the AP is worse than your paper. In principle, more modal fusion effects will be better,the fact is not. I am curious about this.

yangmf2 commented 1 year ago

Dude, what kind of parameters can you run, can you share?

yangmf2 commented 1 year ago

@Roc-Ng