PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
3.03k stars 220 forks source link

Parameter Explanations #115

Open Davidyao99 opened 9 months ago

Davidyao99 commented 9 months ago

Great work, may I clarify what do the different parameters in cli.py do?

Specifically, what does load-4bit, load-8bit, conv-mode and max-new-tokens do?

Thank you!! Trying to understand the parameters better so that I can tune them for my specific task! ;)

LinB203 commented 9 months ago

The model is trained under 16-bit float precision. So loading in 4/8bit float precision can inference faster. conv-mode is the conversation template. max-new-token is the maximum tokens that model generates.

onlyonewater commented 7 months ago

does the model that loads the 8-bit float would get a lower performance than 16-bit in the downstream tasks? @LinB203