Open lucasjinreal opened 7 months ago
是由于conv template和process函数不匹配导致的。你可以参考custom.md。 [En] the conv template mismatched with process function, you can refer to the custom.md 或者直接用我们提供的关于qwen的代码。 [En] We provide the code for Qwen2 https://github.com/PKU-YuanGroup/MoE-LLaVA/issues/39
应当使用qwen的conv template。 [En] You can try the qwen conv template that we have provided.
我应该已经调整了conv template为 mpt 格式。
其次,我用的是transformers最新版,会有
WARNING: tokenization mismatch: 42 vs. 43. (ignored) WARNING: tokenization mismatch: 44 vs. 45. (ignored) WARNING: tokenization mismatch: 51 vs. 52. (ignored) WARNING: tokenization mismatch: 45 vs. 46. (ignored) WARNING: tokenization mismatch: 48 vs. 49. (ignored) WARNING: tokenization mismatch: 43 vs. 44. (ignored)
warnin 出现
btw, 我使用的Qwen1.5. tokenizer中应该已经包含了special token:
{
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": ["<|im_start|>", "<|im_end|>"],
"bos_token": null,
"chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 32768,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}
你能澄清一下什么是mpt的conv template吗?贴一下你的运行命令。 [En] What's the mpt conv template? Could you post you run command?
我发现你是改了这个地方:
if has_image:
round_len = len(tokenizer_image_token(rou, tokenizer)) + 1 # for eos_token
instruction_len = len(tokenizer_image_token(parts[0], tokenizer)) - 1 # instruction_len is before the answer
else:
round_len = len(tokenizer(rou).input_ids)
instruction_len = len(tokenizer(parts[0]).input_ids) - 1
这里我好像没改,但是有个问题,你对应的qwen conv template是:
conv_qwen = Conversation(
system="A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions.",
roles=("USER", "ASSISTANT"),
version="qwen", # replace
messages=(),
offset=0,
sep_style=SeparatorStyle.TWO,
sep=" ",
sep2="<|endoftext|>", # replace with eos_token
)
我用的是:
conv_mpt = Conversation(
system="""<|im_start|>system
You should follow the instructions carefully and explain your answers in detail.""",
# system = None,
roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),
version="mpt",
messages=(),
offset=0,
sep_style=SeparatorStyle.MPT,
sep="<|im_end|>",
)
因为基于qwen的chat模型是chatml的格式,难道不应该沿用chatml会好一点么
chatml 上面的process要咋改
我觉得system prompt不会太影响模型性能。你可以把system="A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."
改成system="You should follow the instructions carefully and explain your answers in detail."
[En] I think system prompt will not affect the performance seriously. You can modify system="A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."
to system="You should follow the instructions carefully and explain your answers in detail."
我尝试修改了preprocess, 沿用了chatml template, loss还是 0.
理论上来说,和你的template也是多了一个 eos,应该不至于loss 泵掉
此外,修改之后,依旧存在 WARNING: tokenization mismatch: 49 vs. 50. (ignored) WARNING: tokenization mismatch: 47 vs. 48. (ignored) WARNING: tokenization mismatch: 46 vs. 47. (ignored) WARNING: tokenization mismatch: 54 vs. 55. (ignored) WARNING: tokenization mismatch: 45 vs. 46. (ignored) WARNING: tokenization mismatch: 60 vs. 61. (ignored) WARNING: tokenization mismatch: 48 vs. 49. (ignored) WARNING: tokenization mismatch: 48 vs. 49. (ignored) WARNING: tokenization mismatch: 44 vs. 45. (ignored)
Could you post you run command?
@LinB203 Yes:
MODEL_VERSION=qwen-1.8b
########### DO NOT CHANGE ###########
########### USE THIS FOR BOTH ###########
PROMPT_VERSION=qwen
deepspeed train_xformers.py \
--deepspeed ./scripts/zero2.json \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
--version $PROMPT_VERSION \
--data_path ./data/llava_0.1/pretrain_data.json \
--image_folder ./data/images \
--vision_tower ./checkpoints/open-clip-vit-large-patch14-336px \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--bf16 False \
--output_dir ./checkpoints/llava-$MODEL_VERSION-pretrain \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 24000 \
--save_total_limit 1 \
--learning_rate 1e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True
Am doing pretrain stage and on 4xV100 for testing.
https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/scripts/v1/qwen/pretrain.sh#L9
--version plain
in Stage 1.
I changed to plain, still got loss 0
PROMPT_VERSION=plain
########### DO NOT CHANGE ###########
deepspeed train_xformers.py \
--deepspeed ./scripts/zero2.json \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
--version $PROMPT_VERSION \
--data_path ./data/llava_0.1/pretrain_data.json \
--image_folder ./data/images \
--vision_tower ./checkpoints/chinese-clip-vit-large-patch14-336px \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--bf16 False \
--output_dir ./checkpoints/llava-$MODEL_VERSION-pretrain \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 24000 \
--save_total_limit 1 \
--learning_rate 1e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True
and the warning perssist, why
WARNING: tokenization mismatch: 59 vs. 61. (ignored)
WARNING: tokenization mismatch: 54 vs. 56. (ignored)
WARNING: tokenization mismatch: 57 vs. 59. (ignored)
WARNING: tokenization mismatch: 65 vs. 67. (ignored)
WARNING: tokenization mismatch: 70 vs. 72. (ignored)
WARNING: tokenization mismatch: 62 vs. 64. (ignored)
WARNING: tokenization mismatch: 62 vs. 64. (ignored)
WARNING: tokenization mismatch: 61 vs. 63. (ignored)
{'loss': 0.0, 'learning_rate': 1.5267175572519083e-05, 'epoch': 0.0}
Sorry, I can not reproduce your error. Please repull the latest code and follow the custom.md, which is enough clear to implement Qwen1.5 as https://github.com/PKU-YuanGroup/MoE-LLaVA/issues/39 done.
The Qwen1.5 scripts are same with qwen and only need to modify --model_name_or_path
. Our latest code is support Qwen1.5 https://github.com/PKU-YuanGroup/MoE-LLaVA/issues/39#issuecomment-1945654824.
I got it work now, the loss shows:
{'loss': 16.2781, 'learning_rate': 3.816793893129771e-06, 'epoch': 0.0}
{'loss': 15.7207, 'learning_rate': 7.633587786259541e-06, 'epoch': 0.0}
{'loss': 15.9175, 'learning_rate': 1.1450381679389314e-05, 'epoch': 0.0}
{'loss': 15.8711, 'learning_rate': 1.5267175572519083e-05, 'epoch': 0.0}
{'loss': 15.352, 'learning_rate': 1.9083969465648855e-05, 'epoch': 0.0}
{'loss': 14.9108, 'learning_rate': 2.2900763358778628e-05, 'epoch': 0.0}
{'loss': 14.1826, 'learning_rate': 2.6717557251908397e-05, 'epoch': 0.0}
{'loss': 13.4192, 'learning_rate': 3.0534351145038166e-05, 'epoch': 0.0}
it's it normal for pretrain stage? looks like very huge
@LinB203 The 1.5 support very nice! Then you must upgraded to latest tansofmers to support qwen2 tokenizer? How about using transformers's MOE arch to minimal the code!
BTW, did u tried open both vision tower and projector in both stage1 and stage2?
I haven't run qwen2 on clip-large-336, it will converge to around 0.8 in siglip-384. We support qwen2 tokenizer https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/moellava/train/train.py#L1402. We did not modify the MLP projector and you can change vision encoder by following here.
Is qwen1.8b on 1 epoch? with llava pretrain dataset?
BTW, did u tried open both vision tower and projector in both stage1 and stage2?
Do u tried this training step same as Yi-6b-VL?
Is qwen1.8b on 1 epoch? with llava pretrain dataset?
BTW, did u tried open both vision tower and projector in both stage1 and stage2?
Do u tried this training step same as Yi-6b-VL?
Yes, with sharegpt4v pretrain dataset. No.
@LinB203 Shouldn't be 1.8 MoE? Does the model open? Does the pretrained (no finetune) model able to do simple image caption ?
@LinB203 Shouldn't be 1.8 MoE? Does the model open? Does the pretrained (no finetune) model able to do simple image caption ?
The model have not been released. Yes.
@LinB203
@LinB203 I think the pretrain loss hard to be 0.2, the official pretrain loss of llava is about 1.9:
How did u guys manageed trained pretrain loss so small?
Schedule to next month. We use the pretrained dataset from sharegpt4v, which is about 1.2M QA pairs.
@LinB203
- When will the model release? Looks like better than currect model, is it also 1.8bx4 moe?
- Which pretrained data you were using? sharegpt4v_instruct_gpt4-vision_cap100k.json or pt part? Does pt part contains many noise maybe?
We use the pretrained dataset from sharegpt4v, which is about 1.2M QA pairs.
We use the pretrained dataset from sharegpt4v, which is about 1.2M QA pairs.
@LinB203 I found the sharegpt4v also lack of Chinese part data. Do u think any hight quality Chinese pretrain image-text pair can be used to enhance Chinese ability?
@LinB203 Do u think raw ocr image-text paris can be used in pretrain data?
楼主有遇到过类似的情况吗? {'loss': 0.0, 'learning_rate': 0.001435114503816794, 'epoch': 0.02}
2%|██▊ | 188/8720 [14:45<11:05:28, 4.68s/it]WARNING: tokenization mismatch: 58 vs. 59. (ignored) WARNING: tokenization mismatch: 41 vs. 42. (ignored)