支持自定义视觉编码器么（llava-llama3）?

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

https://xtuner.readthedocs.io/zh-cn/latest/

Apache License 2.0

3.92k stars 305 forks source link

Open Yanllan opened 5 months ago

Yanllan commented 5 months ago

支持自定义视觉编码器么（llava-llama3）? 例如将clip换成siglip? 该如何实现？哪些代码需要修改？

hhaAndroid commented 5 months ago

已经在重构视觉部分了，快了。

ztfmars commented 5 months ago

支持自定义视觉编码器么（llava-llama3）? 例如将clip换成siglip? 该如何实现？哪些代码需要修改？

哇，兄弟，你也是看了google 的paligamma吗？sigclip这个确实要比vitclip好用啊。

yuzhms commented 3 months ago

请问有进展吗？