https://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/963416dd-fd97-4680-b7ac-fa4a14beaaae
https://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/0e7c33f6-33d3-478a-ab0e-ecc116aeec78
无视频sft的MPP-14B模型多图对话(看似回答,实际啥都没说):
视频sft后的MPPQwen-8B模型(具备比较不同图像的能力):
device_map="auto"
)conda create -n minigpt4qwen python=3.8 && conda activate minigpt4qwen
pip install -e .
请放在cache
目录中,结构如下
模型权重请参照:WEIGHT.md
训练数据请参照:DATA.md
请先按照WEIGHT.md配置好权重
并在以下链接中二选一,下载sft后的模型权重(15GB):
Single-GPU Inference
python cli_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth
MultiGPU(llm使用device_map="auto"加载
,可以多卡加载LLM部分模型:
python cli_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --llm_device_map "auto"
CPU(速度慢):
python cli_demo.py--model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --cpu-only # 如果显存足够(>=20GB)可以不要--cpu-only
运行后需要输入图片路径,可以输入多张图片,用:f
结束图片路径输入后进入对话
常见操作:
:help 查看help
:clear 清空当前命令行
:clh 清空对话历史(但图像输入不会更改)
:his 查看对话历史
:img 查看输入的图像路径
Single-GPU Inference
python webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth
MultiGPU(llm使用device_map="auto"加载
python webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --llm_device_map "auto"
CPU:
python webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --cpu-only # 如果显存足够(>=20GB)可以不要--cpu-only
下面为8卡3090运行指令:
nproc_per_node: 8 dp: 4 pp: 2 nproc_per_node = pp * dp
python -m torch.distributed.run --nproc_per_node=8 train_pipeline.py --cfg-path lavis/projects/pp_qwen7b_video/pretrain.yaml --num-stages 2
nproc_per_node: 8 dp: 1 pp: 8 nproc_per_node = pp * dp
python -m torch.distributed.run --nproc_per_node=8 train_pipeline.py --cfg-path lavis/projects/pp_qwen7b_video/sft.yaml --num-stages 8
(仅转换linear projection层)
python pipe_proj2pth.py --ckpt-dir lavis/output/pp_7b_video/pretrain/global_step2181
转换后,模型文件会存储在ckpt_dir
底下,名为model.pth
(需要转换projection层和所有LLM的参数)
python pipemodel2pth.py --ckpt-dir lavis/output/pp_7b_video/sft_video/global_step2005
转换后,模型文件会存储在ckpt_dir
底下,名为unfreeze_llm_model.pth
pretrain:
sft:
处理函数可以参考: https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen-Next_ckpt-and-data/ckpt-and-data.zip中,llava_instuct和videochatgpt目录里的analysis.py
脚本
P.S.: 如果路径经常出错,可以把所有路径都改成绝对路径(包括dataset configs)
单轮(instruction和output为str
):
[
{
"image": "000000215677.jpg",
"instruction": "<Img><ImageHere></Img> {question}",
"output": "{answer}"
},
]
多轮(instruction和output为等长的list
):
{
"image": "000000479443.jpg",
"instruction": [
"<Img><ImageHere></Img> {question1}",
"{question2}",
"..."
],
"output": [
"{answer1}",
"{answer2}",
"..."
]
},
[
{
"video": "v_k_ZXmr8pmrs.mkv",
"instruction": "<Img><ImageHere></Img> {question}",
"output": "{answer}"
}
]