Open 18600709862 opened 3 months ago
Hello @18600709862
I noticed that you encountered a RuntimeError: shape '[-1, 151936]' is invalid for input of size 77642752
error. This usually happens when there is a mismatch in the vocabulary size between the "vocab_size" in the "config.json" file (which is 151936) and the actual length of tokenizer.get_vocab() for the "Qwen1.5-7B-Chat" model that you are using.
To fix this error, you can modify the code in "https://github.com/18907305772/FuseLLM/blob/main/FuseChat/train/data_collator.py#L228" to vocab_size = len(self.tokenizer.get_vocab()).
Modify or report an error? I guess you have to start over again after modifying the code.
thank you very much for your help!
Once you modify this line in your local script, the loss calculation should be successful.
after modify this line in my local script, can from step one exec,
Here we show the scripts to obtain representations from multiple source LLMs for model fusion.
Get representations for each source LLM
多谢您的帮助 目前,不清楚是否qwen需要如此,还是所有模型都需要重新执行, 从第一步1、开始获取每个来源 LLM 的表示 仅仅qwen模型需要处理20小时,其它Mistral模型快一些 2、对齐不同来源法学硕士的表述 3、成对知识融合 目前这一步不成功 现在修改代码后,重新从第一步开始执行
You do not need to rerun from the first step, just modify the script and continue to conduct pairwise knowledge fusion.
@18907305772 多谢您的回复 已经修改代码,(https://github.com/18907305772/FuseLLM/blob/main/FuseChat/train/data_collator.py#L228)" to vocab_size = len(self.tokenizer.get_vocab()). 目前执行脚本如下:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node=8 --master_port=20001 ./train/train_lora.py \
--model_name_or_path "/media/root/sdc1/data/model/Qwen1.5-7B-Chat" \
--data_path "save_1_2_3/1/0,save_1_2_3/1/1,save_1_2_3/1/2,save_1_2_3/1/3" \
--fp16 True \
--output_dir "save_1_2/model" \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 10000 \
--save_total_limit 1 \
--learning_rate 5e-6 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length 512 \
--gradient_checkpointing True \
--conv_temp "qwen1.5-7b-chat" \
--lazy_preprocess True \
--flash_attn_transformers True \
--do_train \
--do_distill \
--distill_with_ref_model True \
--distill_with_aligned_model_0 True \
--distill_with_aligned_model_1 False \
--distill_loss_type "ce" \
--distill_teacher_temperature 1.0 \
--lm_loss_weight 0.9 \
--distill_greater_as_gt True \
--distill_greater_as_gt_type hard \
--dataloader_num_workers 1 \
--remove_unused_columns False
由于是8块3090,所以用了lora执行,执行到这里
0%| | 0/35613 [00:00<?, ?it/s]
vocab_size= 151936
Traceback (most recent call last):
File "/media/root/ssd2t/data/pro/tmp/ol/new/new/quant/merger/FuseLLM/FuseChat/./train/train_lora.py", line 230, in
非常抱歉,我这边没有注意到你用的是 Qwen1.5 系列模型。由于该模型 tokenizer & vocabulary 和 Mistral 系列存在不同,所以需要额外的词表对齐操作,目前 FuseChat README.md 中提供的 sh 脚本仅限相同词表的模型 e.g., Mistral & Solar & Mistral,但相应的 python 代码我们已经提供。我会尽快更新这部分的 sh 脚本,请耐心等待,谢谢!
非常感谢您的回复,期待更新的脚本。
We release the updated code and script here: https://github.com/18907305772/FuseAI/tree/FuseChat-2.0/FuseChat The README.md here only use Qwen1.5 as the source LLM, but you can try to use it as the pivot LLM.
exec readme bash Pairwise Knowledge Fusion
FuseLLM/FuseChat/train/trainer.py", line 121, in compute_loss
if self.args.distill_loss_type == "ce": loss_lm = cross_entropy(input=outputs["logits"].view(-1, vocab_size), target=target_dist.view(-1, vocab_size), reduction="none").view(batch_size, -1) # (bs, seq_len)
RuntimeError: shape '[-1, 151936]' is invalid for input of size 77642752