TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Apache License 2.0
125 stars 4 forks source link

question about training recipe #17

Open Nastu-Ho opened 5 months ago

Nastu-Ho commented 5 months ago
image

Which configuration file can reproduce the 54.x effect of the paper?

farewellthree commented 5 months ago

*qa.yaml

Backdrop9019 commented 4 months ago

I find it a bit odd to use different training datasets depending on the benchmark. For example, with Videochat2, all instruction datasets were trained together and then evaluated on various benchmarks (as far as I know). However, for ST-LLM, different instruction datasets are used for training based on the benchmark and then evaluated separately. Doesn’t this seem unfair? I’m curious about the rationale behind dividing the data this way.

farewellthree commented 4 months ago

Hello, thank you for pointing out the issue. The reason we did this is that using instruction data in the form of multiple-choice questions from datasets like K400, SSV2, and CLEVRER, although beneficial for MVBench, would severely impact the model's dialogue performance, leading to significant hallucinations. Our approach actually used less data to achieve better results.