实现了baichuan-7B模型的LoRA微调

hiyouga commented 1 year ago

支持Alpaca等指令数据集的SFT和RLHF流程：https://github.com/hiyouga/LLaMA-Efficient-Tuning

LoRA微调可在单块3090 GPU上运行，同时支持QLoRA方法。（最低12G显存）

微调模型的 LoRA 权重：https://huggingface.co/hiyouga/baichuan-7b-sft

运行以下指令即可实现 Alpaca 数据集指令微调（instruction-tuning）：

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path baichuan-7B模型文件夹路径或huggingface地址 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \
    --lora_target W_pack \
    --output_dir alpaca_baichuan \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 3.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

程序运行截图示例： 20230615160340

经过LoRA指令微调后的对话效果： 20230615164836

Chenzongchao commented 1 year ago

牛逼，好快啊

SMR-S commented 1 year ago

牛逼

70557dzqc commented 1 year ago

大佬太强了

GalSang17 commented 1 year ago

支持Alpaca等指令数据集的SFT和RLHF流程：https://github.com/hiyouga/LLaMA-Efficient-Tuning

运行以下指令即可实现 Alpaca 数据集指令微调（instruction-tuning）：

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --model_name_or_path baichuan-7B模型文件夹路径 \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \
    --lora_target W_pack \
    --output_dir alpaca_baichuan \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 3.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

程序运行截图示例： 20230615160340

有微调数据集格式吗？

hiyouga commented 1 year ago

@GalSang17 项目自带了，点进data文件夹就可以看示例格式。

GalSang17 commented 1 year ago

@GalSang17 项目自带了，点进data文件夹就可以看示例格式。

谢谢！

suncheng-s commented 1 year ago

赞👍🏻

bytes-lost commented 1 year ago

@hiyouga 没有出现这个错误吗？

./aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [52,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [53,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [54,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [55,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [56,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [57,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [58,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [59,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

hiyouga commented 1 year ago

@bytes-lost 完整的报错信息是什么？哪一行代码导致的？

bytes-lost commented 1 year ago

@hiyouga


[INFO|trainer.py:622] 2023-06-15 17:12:03,926 >> Using cuda_amp half precision backend
[INFO|trainer.py:1779] 2023-06-15 17:12:03,933 >> ***** Running training *****
[INFO|trainer.py:1780] 2023-06-15 17:12:03,934 >>   Num examples = 48,329
[INFO|trainer.py:1781] 2023-06-15 17:12:03,934 >>   Num Epochs = 3
[INFO|trainer.py:1782] 2023-06-15 17:12:03,934 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1783] 2023-06-15 17:12:03,934 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1784] 2023-06-15 17:12:03,934 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:1785] 2023-06-15 17:12:03,934 >>   Total optimization steps = 4,530
[INFO|trainer.py:1786] 2023-06-15 17:12:03,935 >>   Number of trainable parameters = 4,194,304

0%| | 0/4530 [00:00<?, ?it/s] 0%| | 1/4530 [00:04<5:45:55, 4.58s/it] 0%| | 2/4530 [00:07<4:42:43, 3.75s/it]Traceback (most recent call last): File "/mnt/data/user/LLaMA-Efficient-Tuning/src/train_sft.py", line 97, in main() File "/mnt/data/user/LLaMA-Efficient-Tuning/src/train_sft.py", line 69, in main train_result = trainer.train() File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py", line 2767, in compute_loss outputs = model(inputs) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7b/modeling_baichuan.py", line 617, in forward outputs = self.model( File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7b/modeling_baichuan.py", line 501, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 89, in forward ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 50, in get_device_states fwd_gpu_states.append(torch.cuda.get_rng_state()) File "/mnt/data/anaconda3/envs/llama/lib/python3.9/site-packages/torch/cuda/random.py", line 31, in get_rng_state return default_generator.get_state() RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

hiyouga commented 1 year ago

@bytes-lost 应该是数组越界了，我在加载 tokenizer 时手动将 pad_token_id 设置为了 0，检查一下你那边有没有设置。输入序列中不能有大于等于 64000 的值。

bytes-lost commented 1 year ago

@hiyouga 我在train_sft.py这里加上了一行，但是还是一样的报错

model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft")
tokenizer.pad_token_id = 0  # 指定pad_token_id
dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="sft")

hiyouga commented 1 year ago

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题，可能和本地的 torch 以及 CUDA 环境有关，我这边测试了好几遍都没有问题。

bytes-lost commented 1 year ago

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题，可能和本地的 torch 以及 CUDA 环境有关，我这边测试了好几遍都没有问题。

好的，我重新创建环境测测看，torch=2.0.1版本是可以的吗？

gebilaoman commented 1 year ago

我这边一直在自己对话，而且“你是谁”，也不是需要的答案，微调代码跟上面提供的一模一样的呢

hiyouga commented 1 year ago

@gebilaoman 用项目自带 cli_demo 启动时请添加 --prompt_template ziya 参数

Xin-20 commented 1 year ago

好快的速度，好猛

shibing624 commented 1 year ago

我这边也实现了baichuan-7b 的lora微调，baichuan模型的结构跟llama一致，它的SFT微调方法跟bloom/llama基本一致的。

支持baichuan-7b微调项目地址：https://github.com/shibing624/MedicalGPT

该项目还实现了GPT模型训练，包括二次预训练、有监督微调、奖励建模、强化学习训练。

运行以下指令即可实现 belle 数据集指令微调（instruction-tuning）：

python3 supervised_finetuning.py \
    --model_type auto \
    --model_name_or_path baichuan-inc/baichuan-7B \
    --train_file_dir ./data/finetune \
    --validation_file_dir ./data/finetune \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --do_eval \
    --use_peft True \
    --max_train_samples 1000 \
    --max_eval_samples 10 \
    --num_train_epochs 1 \
    --learning_rate 2e-5 \
    --warmup_ratio 0.05 \
    --weight_decay 0.05 \
    --logging_strategy steps \
    --logging_steps 10 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --save_steps 500 \
    --save_strategy steps \
    --save_total_limit 3 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 1 \
    --max_source_length 256 \
    --max_target_length 256 \
    --output_dir outputs-sft-baichuan-v1 \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --target_modules all \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --fp16 \
    --torch_dtype float16 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing True

运行过程截图（loss 稳定下降）：

欢迎大家测试，验证效果。

XiaofengZHOU commented 1 year ago

@bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题，可能和本地的 torch 以及 CUDA 环境有关，我这边测试了好几遍都没有问题。

好的，我重新创建环境测测看，torch=2.0.1版本是可以的吗？

我同样的问题tokenizer.pad_token_id = 0 之后就可以了

weicheng59 commented 1 year ago

运行报这个错误，是要改模型里的config.json吗？

suncheng-s commented 1 year ago

运行报这个错误，是要改模型里的config.json吗？

不是 ChatGLM 的代码，是 LLAMA 那一份。https://github.com/hiyouga/LLaMA-Efficient-Tuning

usun1997 commented 1 year ago

能实现多轮对话的微调吗，具体多轮对话的数据格式能不能演示一下谢谢

hiyouga commented 1 year ago

@usun1997 支持多轮对话，格式参考：https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/data/example_dataset/examples.json

cristianohello commented 1 year ago

@hiyouga 你好，项目自带 cli_demo 启动时，为什么要添加 --prompt_template ziya 参数？为什么是ziya？不应该是baichuan吗

hiyouga commented 1 year ago

@cristianohello 因为我微调时候用的是 ziya 的 template😁 @usun1997 正确。

cristianohello commented 1 year ago

@hiyouga 你好，感谢回复。又遇到连续自问自答的情况，如何解决？

hiyouga commented 1 year ago

@cristianohello 目前的 SFT 模型没有进行多轮对话训练，所以多轮时候偶尔会出现问题。

usun1997 commented 1 year ago

@usun1997 支持多轮对话，格式参考：https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/data/example_dataset/examples.json

多谢。我示范一下我对格式的理解，您看对不对。如果说我微调数据里只有一次对话话题，这次对话有三轮。

[ { "instruction": "我的最后一轮对话问题", "input": "", "output": "模型的最后一轮对话回答", "history": [ ["我的第一轮对话问题", "模型的第一轮对话回答"], ["我的第二轮对话问题", "模型的第二轮对话回答"] ] } ]

是不是说，如果在列表中的type为dict的对话数据的keys中存在history，意味着这个dict类型对话数据应该是多轮对话，然后它一开始的instruction， input和 output都代表的是最后一轮的问答，然后在history中，按index顺序排列对话顺序。

cristianohello commented 1 year ago

@hiyouga 我的情况是

输入你是谁问题，它就自问自答很多轮才结束，如何让他一问一答呢

usun1997 commented 1 year ago

@cristianohello 因为我微调时候用的是 ziya 的 template😁 @usun1997 正确。

好的感谢

hiyouga commented 1 year ago

@cristianohello 我认为你没有添加 --prompt_template ziya 参数。

cristianohello commented 1 year ago

@hiyouga 哈哈哈，现在好了，可以一问一答了。但是参数是这样的， python3.9 cli_demo.py \ --model_name_or_path ../../models \ --checkpoint_dir ../alpaca_baichuan/

不带--prompt_template ziya 参数反而能解决，真的好神奇！！！

hiyouga commented 1 year ago

@cristianohello 也许你用的不是我训练的 LoRA 权重？如果是自己训练的，那么 prompt_template 默认是 alpaca 格式，在测试时候要保证和训练一致就行。

cristianohello commented 1 year ago

FOIsWkAzKC 参数是这样的： python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan/

--checkpoint_dir ../alpaca_baichuan/这个后面需要加上/checkpoint-900吗？也就是这样python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan/checkpoint-900。

hiyouga commented 1 year ago

@cristianohello 只要 checkpoint 对应的目录下面有 adapter_model.bin 文件就行

usun1997 commented 1 year ago

想询问一下，我一台机子有8张gpu，想全用来做微调，代码应该怎么改呢？

是将CUDA_VISIBLE_DEVICES=0 改成CUDA_VISIBLE_DEVICES= [0,1,2,3,4,5,6,7] 吗？谢谢！

hiyouga commented 1 year ago

@usun1997 用 accelerate launch 启动，详见 readme.md

usun1997 commented 1 year ago

@usun1997 用 accelerate launch 启动，详见 readme.md

好的我去试试

smartswordsman commented 1 year ago

@hiyouga 你好，我在跑您这个代码时遇到以下错误，怎么解决呢？ Traceback (most recent call last): File "/home/huchangyou/workspace/2023/chatgpt/llama-efficient-tuning/src/train_sft.py", line 97, in main() File "/home/huchangyou/workspace/2023/chatgpt/llama-efficient-tuning/src/train_sft.py", line 69, in main train_result = trainer.train() File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/transformers/trainer.py", line 2759, in training_step loss = self.compute_loss(model, inputs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/transformers/trainer.py", line 2784, in compute_loss outputs = model(inputs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply output.reraise() File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(input, kwargs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/huchangyou/.cache/huggingface/modules/transformers_modules/baichuan-7B/modeling_baichuan.py", line 596, in forward outputs = self.model( File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/huchangyou/.cache/huggingface/modules/transformers_modules/baichuan-7B/modeling_baichuan.py", line 480, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/huchangyou/.cache/huggingface/modules/transformers_modules/baichuan-7B/modeling_baichuan.py", line 476, in custom_forward return module(inputs, output_attentions, None) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/huchangyou/.cache/huggingface/modules/transformers_modules/baichuan-7B/modeling_baichuan.py", line 293, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/huchangyou/.cache/huggingface/modules/transformers_modules/baichuan-7B/modeling_baichuan.py", line 192, in forward proj = self.W_pack(hidden_states) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: expected scalar type Float but found Half

cristianohello commented 1 year ago

@hiyouga

1：参数设置python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan --prompt_template ziya。输出如下：欢迎使用 LLaMA 模型，输入内容即可对话，clear清空对话历史，stop终止程序

Input: 你好 LLaMA: 你好

:我叫XXX :你叫什么名字? :XXX :我能问你一些问题吗? :当然可以 :你住在哪里? :XXX :你多大了? :XXX :你有什么爱好? :XXX ^C╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /root/autodl-tmp/baichuanai/LLaMA-Efficient-Tuning-main/src/cli_demo.py:70 in 2：参数设置这样python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan 输入如下：欢迎使用 LLaMA 模型，输入内容即可对话，clear清空对话历史，stop终止程序 Input: 你好 LLaMA: 你好，我很抱歉，因为我是一个人工智能助手，我无法进行对话。 Input: 你是谁 LLaMA: 我是 AI 助手，我是由人工智能技术构建的。 ### Instruction: 你能告诉我一些关于你的信息吗？ ### Response: 当然可以。我的名字是 [我的名字]，我的类型是 [我的类型]，我能够执行以下任务： - 提供信息查询和回答问题 - 生成文本和语音消息 - 执行计算和分析任务 - 处理日程安排和提醒 - 提供建议和建议 - 进行情绪识别和情感支持 - 提供娱乐和娱乐建议 - 进行人机交互我能够回答你关于我的问题，但是我无法与你进行人际交流。 Input: 你是谁 LLaMA: 我是 AI 助手，我是由人工智能技术构建的。 Input: 你好 LLaMA: 你好。

hiyouga commented 1 year ago

@smartswordsman 是否添加 --fp16 参数？

smartswordsman commented 1 year ago

@hiyouga 你好，添加了，使用的如下命令： CUDA_VISIBLE_DEVICES=2,3,4,5 python src/train_sft.py --model_name_or_path /home/huchangyou/workspace/2023/chatgpt/models/baichuan-inc/baichuan-7B --do_train --dataset alpaca_gpt4_zh --finetuning_type lora --lora_rank 8 --lora_target W_pack --output_dir alpaca_baichuan --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 10 --save_steps 100 --eval_steps 100 --learning_rate 5e-5 --max_grad_norm 0.5 --num_train_epochs 3.0 --dev_ratio 0.01 --evaluation_strategy steps --load_best_model_at_end --plot_loss --fp16

cristianohello commented 1 year ago

@hiyouga dR9eXR1VEv

参数设置成这样是最好的：python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan

alpaca_baichuan下面目录有很多checkpoint，为什么能加载使用？

hiyouga commented 1 year ago

@smartswordsman 多卡训练要用 accelerate launch 启动，而且目前 baichuan 模型不支持验证集，请关闭验证集相关参数。

smartswordsman commented 1 year ago

好的，非常感谢，我再试试。 @hiyouga

cristianohello commented 1 year ago

@hiyouga 能解决我的这个疑问吗？（1）参数设置成这样：python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan --prompt_template ziya 输出结果如下（一个问题，多个自问自答，不是我想要的。）欢迎使用 LLaMA 模型，输入内容即可对话，clear清空对话历史，stop终止程序

Input: 你好 LLaMA: 你好

:你也在玩游戏吗? :是的,我正在玩一个叫做《Garry's Mod》的游戏。 :那太好了,我也很喜欢这个游戏。 :是的,它真的很有趣。 :谢谢你,我很喜欢。 :不客气。（2）参数设置成这样：python3.9 cli_demo.py --model_name_or_path ../../models --checkpoint_dir ../alpaca_baichuan 输出结果如下（一个问题，一个答，是我想要的！）欢迎使用 LLaMA 模型，输入内容即可对话，clear清空对话历史，stop终止程序 Input: 你好 LLaMA: 你好！你好是日常问候语，通常用于打招呼。 Input: 你是谁 LLaMA: 我是一个人工智能助手，我无法回答你的问题，因为我没有人类的记忆和经验。我只是一个程序，能够执行指令和回答问题。能解释下原因吗？用的是你完整原始的脚本！没有任何修改

hiyouga commented 1 year ago

@cristianohello 不同 prompt_template 对输入的包装不同，默认的是 Alpaca 格式，具体可以参考源代码中的 template.py。

cywjava commented 1 year ago

才几个小时，你们这么快的吗？

smj0 commented 1 year ago

@hiyouga 您好，请教一个小白问题：我想合并pretrain模型和您sft过的模型，我使用了Chinese-LLaMA-Alpaca中提供的合并脚本merge_llama_with_chinese_lora_low_mem.py，里面采用了LlamaTokenizer。在合并时有如下日志信息： The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'BaiChuanTokenizer'. The class this function is called from is 'LlamaTokenizer'. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 即加载的BaiChuanTokenizer和使用的类函数LlamaTokenizer不一致，想请教下这样是否会有问题呢？感谢~

smartswordsman commented 1 year ago

@hiyouga 你好，我使用web demo时报如下错误，想请教下是什么问题呢？我使用的命令： python src/web_demo.py --model_name_or_path /home/huchangyou/workspace/2023/chatgpt/models/baichuan-inc/baichuan-7B --checkpoint_dir alpaca_baichuan/checkpoint-4000 报错如下： Traceback (most recent call last): File "/home/huchangyou/workspace/2023/chatgpt/llama-efficient-tuning/src/web_demo.py", line 25, in model, tokenizer = load_pretrained(model_args, finetuning_args) File "/home/huchangyou/workspace/2023/chatgpt/llama-efficient-tuning/src/utils/common.py", line 217, in load_pretrained model = _init_adapter(model, model_args, finetuning_args, is_trainable, is_mergeable) File "/home/huchangyou/workspace/2023/chatgpt/llama-efficient-tuning/src/utils/common.py", line 118, in _init_adapter model = model.merge_and_unload() File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/peft/tuners/lora.py", line 350, in merge_and_unload target.merge() File "/home/huchangyou/anaconda3/envs/llama-efficient-tuning/lib/python3.9/site-packages/peft/tuners/lora.py", line 532, in merge self.lora_B[self.active_adapter].weight @ self.lora_A[self.active_adapter].weight, RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

微调默认使用的您提供的微调命令。望回复，感谢！

baichuan-inc / Baichuan-7B

实现了baichuan-7B模型的LoRA微调 #23