18600709862 commented 3 months ago

exec readme bash Pairwise Knowledge Fusion

FuseLLM/FuseChat/train/trainer.py", line 121, in compute_loss

if self.args.distill_loss_type == "ce": loss_lm = cross_entropy(input=outputs["logits"].view(-1, vocab_size), target=target_dist.view(-1, vocab_size), reduction="none").view(batch_size, -1) # (bs, seq_len)

RuntimeError: shape '[-1, 151936]' is invalid for input of size 77642752

18907305772 commented 3 months ago

Hello @18600709862

I noticed that you encountered a RuntimeError: shape '[-1, 151936]' is invalid for input of size 77642752 error. This usually happens when there is a mismatch in the vocabulary size between the "vocab_size" in the "config.json" file (which is 151936) and the actual length of tokenizer.get_vocab() for the "Qwen1.5-7B-Chat" model that you are using.

To fix this error, you can modify the code in "https://github.com/18907305772/FuseLLM/blob/main/FuseChat/train/data_collator.py#L228" to vocab_size = len(self.tokenizer.get_vocab()).

18600709862 commented 3 months ago

Modify or report an error? I guess you have to start over again after modifying the code.

18600709862 commented 3 months ago

thank you very much for your help!

18907305772 commented 3 months ago

Once you modify this line in your local script, the loss calculation should be successful.

18600709862 commented 3 months ago

after modify this line in my local script, can from step one exec,

Here we show the scripts to obtain representations from multiple source LLMs for model fusion.

Get representations for each source LLM

18600709862 commented 3 months ago

多谢您的帮助目前，不清楚是否qwen需要如此，还是所有模型都需要重新执行，从第一步1、开始获取每个来源 LLM 的表示仅仅qwen模型需要处理20小时，其它Mistral模型快一些 2、对齐不同来源法学硕士的表述 3、成对知识融合目前这一步不成功现在修改代码后，重新从第一步开始执行

18907305772 commented 3 months ago

You do not need to rerun from the first step, just modify the script and continue to conduct pairwise knowledge fusion.

18600709862 commented 3 months ago

@18907305772 多谢您的回复已经修改代码，(https://github.com/18907305772/FuseLLM/blob/main/FuseChat/train/data_collator.py#L228)" to vocab_size = len(self.tokenizer.get_vocab()). 目前执行脚本如下：

成对知识融合

Qwen1.5-7B-Chat <-> Mistral-7B-Instruct-v0.2

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=20001 ./train/train_lora.py \ --model_name_or_path "/media/root/sdc1/data/model/Qwen1.5-7B-Chat" \ --data_path "save_1_2_3/1/0,save_1_2_3/1/1,save_1_2_3/1/2,save_1_2_3/1/3" \ --fp16 True \ --output_dir "save_1_2/model" \ --num_train_epochs 3 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "epoch" \ --save_steps 10000 \ --save_total_limit 1 \ --learning_rate 5e-6 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 512 \ --gradient_checkpointing True \ --conv_temp "qwen1.5-7b-chat" \ --lazy_preprocess True \ --flash_attn_transformers True \ --do_train \ --do_distill \ --distill_with_ref_model True \ --distill_with_aligned_model_0 True \ --distill_with_aligned_model_1 False \ --distill_loss_type "ce" \ --distill_teacher_temperature 1.0 \ --lm_loss_weight 0.9 \ --distill_greater_as_gt True \ --distill_greater_as_gt_type hard \ --dataloader_num_workers 1 \ --remove_unused_columns False 由于是8块3090,所以用了lora执行，执行到这里 0%| | 0/35613 [00:00<?, ?it/s] vocab_size= 151936 Traceback (most recent call last): File "/media/root/ssd2t/data/pro/tmp/ol/new/new/quant/merger/FuseLLM/FuseChat/./train/train_lora.py", line 230, in train() File "/media/root/ssd2t/data/pro/tmp/ol/new/new/quant/merger/FuseLLM/FuseChat/./train/train_lora.py", line 204, in train trainer.train() File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train return inner_training_loop( File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/transformers/trainer.py", line 2118, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/root/anaconda3/envs/llama2/lib/python3.10/site-packages/transformers/trainer.py", line 3036, in training_step loss = self.compute_loss(model, inputs) File "/media/root/ssd2t/data/pro/tmp/ol/new/new/quant/merger/FuseLLM/FuseChat/train/trainer.py", line 122, in compute_loss target=target_dist.view(-1, vocab_size), RuntimeError: shape '[-1, 151936]' is invalid for input of size 77642752 此处代码为 119行 if self.args.distill_loss_type == "ce": print("vocab_size=",vocab_size) loss_lm = cross_entropy(input=outputs["logits"].view(-1, vocab_size), target=target_dist.view(-1, vocab_size), reduction="none").view(batch_size, -1) # (bs, seq_len)

18907305772 commented 3 months ago

非常抱歉，我这边没有注意到你用的是 Qwen1.5 系列模型。由于该模型 tokenizer & vocabulary 和 Mistral 系列存在不同，所以需要额外的词表对齐操作，目前 FuseChat README.md 中提供的 sh 脚本仅限相同词表的模型 e.g., Mistral & Solar & Mistral，但相应的 python 代码我们已经提供。我会尽快更新这部分的 sh 脚本，请耐心等待，谢谢！

18600709862 commented 3 months ago

非常感谢您的回复，期待更新的脚本。

18907305772 commented 1 month ago

We release the updated code and script here: https://github.com/18907305772/FuseAI/tree/FuseChat-2.0/FuseChat The README.md here only use Qwen1.5 as the source LLM, but you can try to use it as the pivot LLM.

18907305772 / FuseAI

can use Qwen1.5-7B-Chat ? #13

成对知识融合

Qwen1.5-7B-Chat <-> Mistral-7B-Instruct-v0.2