SydCaption / SAAT

MIT License
62 stars 21 forks source link

Some questions about your work #37

Closed MarcusNerva closed 3 years ago

MarcusNerva commented 3 years ago

Thank you for your amazing work! Nevertheless, I still have some questions about your motivation.

As you mentioned in your paper(the abstract section): actions generated by existing methods may depend heavily on the co-occurrence of objects, e.g. ‘driving’ is predicted with high confidence whenever both man and car are detected.

I was wondering how did you notice this phenomenon. Did you reach this conclusion by making statistics on MSRVTT or MSVD dataset? If so, how did you make these statistics? Looking forward to your reply!

SydCaption commented 3 years ago

Hi, I observed this phenomenon frequently, and found it was summarized as shortcut. I think this paper can be helpful.