训练tiny问题 - Githubissues

Ucas-HaoranWei / Vary

[ECCV2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

1.65k stars 150 forks source link

Open limitedfxw opened 1 month ago

limitedfxw commented 1 month ago

你好，训练tiny时，如果冻结llm有尝试过吗，效果怎么样，还有就是代码里面冻结llm时为什么不包括get_input_embeddings？

Ucas-HaoranWei commented 1 month ago

冻结LLM肯定不work，冻住llm，图像的256个token会映射成text-like的256token，256个text token能编码极少文字，至少一页得很稀疏才行

limitedfxw commented 1 month ago

代码里面冻结llm的逻辑为什么不包括get_input_embeddings，这个是有啥考虑吗

Ucas-HaoranWei commented 1 month ago

无特殊考虑~，freeze llm最后没用到