TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
https://arxiv.org/abs/2304.08345
MIT License
259 stars 16 forks source link

Comparison between SoTA methods #2

Closed MAGAer13 closed 5 months ago

MAGAer13 commented 1 year ago

Hi, I have read your paper, nice work on various video downstream tasks. However, some of the major or competitive methods are not compared for VideoQA (such as MulTI, mPLUG-2, and UMT-L) and VideoCaption (such as HiTeA, and mPLUG-2). These methods are also SoTA methods and worth for comparison.

Hope you can consider above suggestions, thanks.

TXH-mercury commented 1 year ago

Hi, I have read your paper, nice work on various video downstream tasks. However, some of the major or competitive methods are not compared for VideoQA (such as MulTI, mPLUG-2, and UMT-L) and VideoCaption (such as HiTeA, and mPLUG-2). These methods are also SoTA methods and worth for comparison.

Hope you can consider above suggestions, thanks.

Thanks for the advice. I have already complished VALOR for quite a while. Those latest methods as you mentioned will be compared in our latest works, which will be released next month. Thanks for the attention.