请问LLAVA-1.5-Llama-3-8B的训练主要改动点有哪些? - Githubissues

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

https://xtuner.readthedocs.io/zh-cn/latest/

Apache License 2.0

3.93k stars 309 forks source link

请问LLAVA-1.5-Llama-3-8B的训练主要改动点有哪些? #631

Closed tian969 closed 6 months ago

tian969 commented 6 months ago

我自己尝试复现, 在英文领域表现更低了, 感觉不正常. 我在原版llava上进行的改动, 我只改了对输入进行preprocess的整体逻辑,包括mask targets 这部分和 conversation部分的内容. 请问其他还有啥需要改动的地方嘛?

tian969 commented 6 months ago

是使用 xtuner 复现？还是说自己开发了一套新代码准备复现？

在原版LLAVA上改代码进行复现的.

hhaAndroid commented 6 months ago

@tian969 https://github.com/InternLM/xtuner/tree/main/xtuner/configs/llava/llama3_8b_instruct_clip_vit_large_p14_336 这个结果，我们只是在指令微调对 vit 进行 lora 微调，其他和官方是一样的。

我推荐你用 xtuner 来训练 llava，因为

xtuner 训练速度更快，这个我们测过了
xtuner 即将会包括大量即插即用的模块，方便大家对各类 vl 模型进行微调，包括各类从数据层面，训练层面，架构层面的支持。方便你进行各类修改和定制
xtuner 采用了比较严格的版本管理，你比较容易复现我们发布的结果