TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Apache License 2.0
125 stars 4 forks source link

how to modify the code to support llama3? #16

Closed dragen1860 closed 5 months ago

dragen1860 commented 5 months ago

Hi, dear all: Thought the paper did superior performance with only vicuna-7b models, I want to exploere the potentials on stronger LLMs, such as llama3 or Yi. Anyone give some tips on how to modify the code to support llama3 training with STLLM datasets. Thank you ...

farewellthree commented 5 months ago

The good performance of video LLMs relies on image LLMs, and the role of the large language model itself is not that significant. Currently, the best image LLM is LLaVA1.6. We are now conducting new experiments based on LLaVA1.6 and will update the code trained on LLaVA1.6 soon.