shibing624 commented 1 year ago

实现了 ChatGLM2-6B 的lora微调，可以用来做领域微调，它的SFT微调方法跟chatglm基本一致，需要改下special tokens, lm_head 和enable_input_require_grads 就可以适配（下面代码改好了的）。

支持THUDM/chatglm2-6b微调项目地址：https://github.com/shibing624/MedicalGPT

该项目还实现了GPT模型训练，包括二次预训练、有监督微调、奖励建模、强化学习训练。

运行以下指令即可实现 belle 数据集指令微调（instruction-tuning）：

CUDA_VISIBLE_DEVICES=0 python3 supervised_finetuning.py \
    --model_type chatglm \
    --model_name_or_path THUDM/chatglm2-6b \
    --train_file_dir ./data/finetune \
    --validation_file_dir ./data/finetune \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --do_eval \
    --use_peft True \
    --fp16 \
    --max_train_samples 1000 \
    --max_eval_samples 10 \
    --num_train_epochs 1 \
    --learning_rate 2e-5 \
    --warmup_ratio 0.05 \
    --weight_decay 0.05 \
    --logging_strategy steps \
    --logging_steps 10 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --save_steps 500 \
    --save_strategy steps \
    --save_total_limit 3 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 1 \
    --max_source_length 128 \
    --max_target_length 128 \
    --output_dir outputs-sft-chatglm2-6b-v1 \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --target_modules query_key_value \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --torch_dtype float16 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing True

eval_loss可以稳定下降，模型预测测试ok。

Additional context

No response

diaojunxian commented 1 year ago

@shibing624 你好。

使用你提供的代码运行，虽然看到代码中处理了 enable_input_require_grads

还是有异常报错

 File "/home/llama/huangrong/finetune_from_github.py", line 566, in <module>
    main()
  File "/home/llama/huangrong/finetune_from_github.py", line 509, in main
    model.enable_input_require_grads()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1206, in enable_input_require_grads
    self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads)
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1223, in get_input_embeddings
    return base_model.get_input_embeddings()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1225, in get_input_embeddings
    raise NotImplementedError
NotImplementedError
    main()
  File "/home/llama/huangrong/finetune_from_github.py", line 509, in main
    model.enable_input_require_grads()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1206, in enable_input_require_grads
    self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads)
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1223, in get_input_embeddings
    return base_model.get_input_embeddings()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1225, in get_input_embeddings
    raise NotImplementedError

diaojunxian commented 1 year ago

@shibing624 您运行的机器配置是什么呀，我用你相同的命令，在 3090 上报显存溢出的错误。

shyoulala commented 1 year ago

微调的atention mask 是上三角矩阵还是跟chatglm1的类似呀

shibing624 commented 1 year ago

@shibing624 您运行的机器配置是什么呀，我用你相同的命令，在 3090 上报显存溢出的错误。

代码更新了，拉下新代码就行，做了全错误捕捉就可以了。enable_input_require_grads 可以忽略。我本地机器是V100 32G显存，batch size =4 可以跑。

爆显存的问题，可能是gradient_checkpointing=True在chatglm2上没有应用成功，需要适配下。

BookerDeWitt commented 1 year ago

请问下这个代码可以做full-model微调吗？

shibing624 commented 1 year ago

请问下这个代码可以做full-model微调吗？

没测试chatglm2的全参，llama的全参微调测试可以。

BookerDeWitt commented 1 year ago

请问下这个代码可以做full-model微调吗？

没测试chatglm2的全参，llama的全参微调测试可以。

谢谢，那chatglm2的lora 二次预训练可以吗？

sysuls1 commented 1 year ago

T4单卡爆了，batch size已经设置为1也不行 OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 15.75 GiB total capacity; 14.85 GiB already allocated; 13.62 MiB free; 14.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ACXuLiu commented 1 year ago

@shibing624 您运行的机器配置是什么呀，我用你相同的命令，在 3090 上报显存溢出的错误。

代码更新了，拉下新代码就行，做了全错误捕捉就可以了。enable_input_require_grads 可以忽略。我本地机器是V100 32G显存，batch size =4 可以跑。

爆显存的问题，可能是gradient_checkpointing=True在chatglm2上没有应用成功，需要适配下。

2023-06-27 10:29:59.264 | WARNING | main:main:493 - Could not enable input require_grads on model, skipping. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data_sda/sda/workspace/liuxu/PycharmProjects/yxChatGLM/finetuneLoraTest/supervised_finetuning.p │ │ y:548 in │ │ │ │ 545 │ │ 546 │ │ 547 if name == "main": │ │ ❱ 548 │ main() │ │ │ │ /data_sda/sda/workspace/liuxu/PycharmProjects/yxChatGLM/finetuneLoraTest/supervised_finetuning.p │ │ y:504 in main │ │ │ │ 501 │ │ padding="max_length", │ │ 502 │ │ max_length=full_max_length, │ │ 503 │ ) │ │ ❱ 504 │ trainer = SavePeftModelTrainer( │ │ 505 │ │ model=model, │ │ 506 │ │ args=training_args, │ │ 507 │ │ train_dataset=train_dataset if training_args.do_train else None, │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/transformers/trainer.py:498 in │ │ init │ │ │ │ 495 │ │ self.tokenizer = tokenizer │ │ 496 │ │ │ │ 497 │ │ if self.place_model_on_device and not getattr(model, "is_loaded_in_8bit", False) │ │ ❱ 498 │ │ │ self._move_model_to_device(model, args.device) │ │ 499 │ │ │ │ 500 │ │ # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs │ │ 501 │ │ if self.is_model_parallel: │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/transformers/trainer.py:740 in │ │ _move_model_to_device │ │ │ │ 737 │ │ self.callback_handler.remove_callback(callback) │ │ 738 │ │ │ 739 │ def _move_model_to_device(self, model, device): │ │ ❱ 740 │ │ model = model.to(device) │ │ 741 │ │ # Moving a model to an XLA device disconnects the tied weights, so we have to re │ │ 742 │ │ if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, "tie_weights") │ │ 743 │ │ │ model.tie_weights() │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:989 in │ │ to │ │ │ │ 986 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ 987 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 988 │ │ │ │ ❱ 989 │ │ return self._apply(convert) │ │ 990 │ │ │ 991 │ def register_backward_hook( │ │ 992 │ │ self, hook: Callable[['Module', _grad_t, _grad_t], Union[None, Tensor]] │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:664 in │ │ _apply │ │ │ │ 661 │ │ │ # track autograd history of param_applied, so we have to use │ │ 662 │ │ │ # with torch.no_grad(): │ │ 663 │ │ │ with torch.no_grad(): │ │ ❱ 664 │ │ │ │ param_applied = fn(param) │ │ 665 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │ │ 666 │ │ │ if should_use_set_data: │ │ 667 │ │ │ │ param.data = param_applied │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:987 in │ │ convert │ │ │ │ 984 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │ │ 985 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │ │ 986 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ ❱ 987 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 988 │ │ │ │ 989 │ │ return self._apply(convert) │ │ 990 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ NotImplementedError: Cannot copy out of meta tensor; no data! 拉了最新代码还是报错

Zhongyuan-Ye commented 1 year ago

如何修改 lm_head 呢?

diaojunxian commented 1 year ago

@shibing624 您运行的机器配置是什么呀，我用你相同的命令，在 3090 上报显存溢出的错误。

代码更新了，拉下新代码就行，做了全错误捕捉就可以了。enable_input_require_grads 可以忽略。我本地机器是V100 32G显存，batch size =4 可以跑。爆显存的问题，可能是gradient_checkpointing=True在chatglm2上没有应用成功，需要适配下。

2023-06-27 10:29:59.264 | WARNING | main:main:493 - Could not enable input require_grads on model, skipping. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data_sda/sda/workspace/liuxu/PycharmProjects/yxChatGLM/finetuneLoraTest/supervised_finetuning.p │ │ y:548 in │ │ │ │ 545 │ │ 546 │ │ 547 if name == "main": │ │ ❱ 548 │ main() │ │ │ │ /data_sda/sda/workspace/liuxu/PycharmProjects/yxChatGLM/finetuneLoraTest/supervised_finetuning.p │ │ y:504 in main │ │ │ │ 501 │ │ padding="max_length", │ │ 502 │ │ max_length=full_max_length, │ │ 503 │ ) │ │ ❱ 504 │ trainer = SavePeftModelTrainer( │ │ 505 │ │ model=model, │ │ 506 │ │ args=training_args, │ │ 507 │ │ train_dataset=train_dataset if training_args.do_train else None, │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/transformers/trainer.py:498 in │ │ init │ │ │ │ 495 │ │ self.tokenizer = tokenizer │ │ 496 │ │ │ │ 497 │ │ if self.place_model_on_device and not getattr(model, "is_loaded_in_8bit", False) │ │ ❱ 498 │ │ │ self._move_model_to_device(model, args.device) │ │ 499 │ │ │ │ 500 │ │ # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs │ │ 501 │ │ if self.is_model_parallel: │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/transformers/trainer.py:740 in │ │ _move_model_to_device │ │ │ │ 737 │ │ self.callback_handler.remove_callback(callback) │ │ 738 │ │ │ 739 │ def _move_model_to_device(self, model, device): │ │ ❱ 740 │ │ model = model.to(device) │ │ 741 │ │ # Moving a model to an XLA device disconnects the tied weights, so we have to re │ │ 742 │ │ if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, "tie_weights") │ │ 743 │ │ │ model.tie_weights() │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:989 in │ │ to │ │ │ │ 986 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ 987 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 988 │ │ │ │ ❱ 989 │ │ return self._apply(convert) │ │ 990 │ │ │ 991 │ def register_backward_hook( │ │ 992 │ │ self, hook: Callable[['Module', _grad_t, _grad_t], Union[None, Tensor]] │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:641 in │ │ _apply │ │ │ │ 638 │ │ │ 639 │ def _apply(self, fn): │ │ 640 │ │ for module in self.children(): │ │ ❱ 641 │ │ │ module._apply(fn) │ │ 642 │ │ │ │ 643 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 644 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:664 in │ │ _apply │ │ │ │ 661 │ │ │ # track autograd history of param_applied, so we have to use │ │ 662 │ │ │ # with torch.no_grad(): │ │ 663 │ │ │ with torch.no_grad(): │ │ ❱ 664 │ │ │ │ param_applied = fn(param) │ │ 665 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │ │ 666 │ │ │ if should_use_set_data: │ │ 667 │ │ │ │ param.data = param_applied │ │ │ │ /home/ander/anaconda3/envs/chatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py:987 in │ │ convert │ │ │ │ 984 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │ │ 985 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │ │ 986 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ ❱ 987 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 988 │ │ │ │ 989 │ │ return self._apply(convert) │ │ 990 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ NotImplementedError: Cannot copy out of meta tensor; no data! 拉了最新代码还是报错

注释掉这个方法

shibing624 commented 1 year ago

我最新查了下官方HF上的repo更新，https://huggingface.co/THUDM/chatglm2-6b/commit/189e5df1609cdbd1704e7d0204301ad4c7791f61 看到已经修复了get_input_embeddings 的问题，你下载最新的modeling_chatglm.py 和 config.json 就可以正确运行了，也不需要注释。

另外，我测试了batch_size =2, max_source_length和max_target_length=128，lora_target = query_key_value 显存占用是14.5G，单卡T4可以跑的。

shibing624 commented 1 year ago

如何修改 lm_head 呢?

chatglm2没有lm_head了，用的output_layer

Mrjude commented 1 year ago

如何修改 lm_head 呢?

chatglm2没有lm_head了，用的output_layer

special token 需要怎么修改？

algorithmconquer commented 1 year ago

@shibing624 您好，运行报ImportError: This modeling file requires the following packages that were not found in your environment: configuration_chatglm. Run pip install configuration_chatglm这个错是什么原因呢？

shibing624 commented 1 year ago

如何修改 lm_head 呢?

chatglm2没有lm_head了，用的output_layer

special token 需要怎么修改？

我的代码都改好了的。

shibing624 commented 1 year ago

@shibing624 您好，运行报ImportError: This modeling file requires the following packages that were not found in your environment: configuration_chatglm. Run pip install configuration_chatglm这个错是什么原因呢？

你把模型下载下来就行，全部下载：https://huggingface.co/THUDM/chatglm2-6b/tree/main

grygg commented 1 year ago

请问有提供多机分布式训练的版本吗

shibing624 commented 1 year ago

支持torchrun，具体见项目使用说明readme。

scuyjzh commented 1 year ago

大佬，和你相同的V100机器。运行这个脚本一直卡在这里，是怎么回事啊

(sft) root@ecs-nlp:~/zhangyijie/MedicalGPT# bash run_sft.sh

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

issues

bin /root/miniconda3/envs/sft/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so /root/miniconda3/envs/sft/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /root/miniconda3/envs/sft did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.6/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 116 /root/miniconda3/envs/sft/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /root/miniconda3/envs/sft/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so...

brewswang commented 1 year ago

问一下，推理时输出文本的最大长度是怎么指定的？

shibing624 commented 1 year ago

问一下，推理时输出文本的最大长度是怎么指定的？

看下inference.py脚本，有max_length

Peter-L-FANG commented 1 year ago

保存模型的时候报错：AttributeError: 'ChatGLMTokenizer' object has no attribute 'vocab_file' 是Tokenizer用错了吗？

2023-06-29 02:55:29.188 | INFO | main:main:322 - Init new peft model 2023-06-29 02:55:29.189 | INFO | main:main:329 - Peft target_modules: ['dense', 'dense_4h_to_h', 'dense_h_to_4h', 'query_key_value'] 2023-06-29 02:55:29.189 | INFO | main:main:330 - Peft lora_rank: 8 trainable params: 14823424 || all params: 6258407424 || trainable%: 0.23685616796302714 2023-06-29 02:56:29.379 | DEBUG | main:main:347 - Tokenizer: ChatGLMTokenizer(name_or_path='./chatglm2-6b', vocab_size=64794, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': ''}, clean_up_tokenization_spaces=True)

........................................................... 2023-06-29 02:58:48.376 | INFO | main:main:530 - Saving model checkpoint to outputs-sft-v1 Traceback (most recent call last): File "supervised_finetuning.py", line 550, in main() File "supervised_finetuning.py", line 531, in main save_model(training_args.output_dir, model, tokenizer, training_args) File "supervised_finetuning.py", line 212, in save_model tokenizer.save_pretrained(output_dir) File "/chatnas/llms/finetune/env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2205, in save_pretrained save_files = self._save_pretrained( File "/chatnas/llms/finetune/env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2253, in _save_pretrained vocab_files = self.save_vocabulary(save_directory, filename_prefix=filename_prefix) File "/home/admin/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 137, in save_vocabulary with open(self.vocab_file, 'rb') as fin: AttributeError: 'ChatGLMTokenizer' object has no attribute 'vocab_file'

valkryhx commented 1 year ago

请问：在单机多卡环境跑这段命令报错该怎么修改启动命令呢？ RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)

shibing624 commented 1 year ago

保存模型的时候报错：AttributeError: 'ChatGLMTokenizer' object has no attribute 'vocab_file' 是Tokenizer用错了吗？

2023-06-29 02:55:29.188 | INFO | main:main:322 - Init new peft model 2023-06-29 02:55:29.189 | INFO | main:main:329 - Peft target_modules: ['dense', 'dense_4h_to_h', 'dense_h_to_4h', 'query_key_value'] 2023-06-29 02:55:29.189 | INFO | main:main:330 - Peft lora_rank: 8 trainable params: 14823424 || all params: 6258407424 || trainable%: 0.23685616796302714 2023-06-29 02:56:29.379 | DEBUG | main:main:347 - Tokenizer: ChatGLMTokenizer(name_or_path='./chatglm2-6b', vocab_size=64794, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': ''}, clean_up_tokenization_spaces=True)

........................................................... 2023-06-29 02:58:48.376 | INFO | main:main:530 - Saving model checkpoint to outputs-sft-v1 Traceback (most recent call last): File "supervised_finetuning.py", line 550, in main() File "supervised_finetuning.py", line 531, in main save_model(training_args.output_dir, model, tokenizer, training_args) File "supervised_finetuning.py", line 212, in save_model tokenizer.save_pretrained(output_dir) File "/chatnas/llms/finetune/env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2205, in save_pretrained save_files = self._save_pretrained( File "/chatnas/llms/finetune/env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2253, in _save_pretrained vocab_files = self.save_vocabulary(save_directory, filename_prefix=filename_prefix) File "/home/admin/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 137, in save_vocabulary with open(self.vocab_file, 'rb') as fin: AttributeError: 'ChatGLMTokenizer' object has no attribute 'vocab_file'

手动下载hf model 的各个py文件，更新到本地。官方已经修复此bug

shibing624 commented 1 year ago

请问：在单机多卡环境跑这段命令报错该怎么修改启动命令呢？ RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)

如果单卡显存不足（少于15G），需要多卡加载，就之间用python执行，device=auto可以自动分配多卡加载模型；
如果是需要多卡并行，MedicalGPT已经支持了，主要是local_rank 指定下，每个卡都加载模型，数据分片训练。

rockiachen commented 1 year ago

模型训练后加载会出问题，楼主有方案吗

shibing624 commented 1 year ago

我测试没问题，inference.py 执行

ZzyChris97 commented 1 year ago

请问你有评估过微调后的效果吗？我用了些QA数据微调感觉没什么效果

boxter007 commented 1 year ago

请问：在单机多卡环境跑这段命令报错该怎么修改启动命令呢？ RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)

如果单卡显存不足（少于15G），需要多卡加载，就之间用python执行，device=auto可以自动分配多卡加载模型；

如果是需要多卡并行，MedicalGPT已经支持了，主要是local_rank 指定下，每个卡都加载模型，数据分片训练。

我今天下的你的代码，glm6b2进行Supervised FineTuning的时候，用python加载。两张卡会报错，devicemap设置里auto。

ArtificialZeng commented 1 year ago

您说的改special tokens, lm_head 和enable_input_require_grads怎么改啊？？？？？？？？？？

shibing624 commented 1 year ago

您说的改special tokens, lm_head 和enable_input_require_grads怎么改啊？？？？？？？？？？

我的代码改过了。

ArtificialZeng commented 1 year ago

@shibing624 您运行的机器配置是什么呀，我用你相同的命令，在 3090 上报显存溢出的错误。

代码更新了，拉下新代码就行，做了全错误捕捉就可以了。enable_input_require_grads 可以忽略。我本地机器是V100 32G显存，batch size =4 可以跑。

爆显存的问题，可能是gradient_checkpointing=True在chatglm2上没有应用成功，需要适配下。

请教这个gradient_checkpointing这个参数是做什么的鸭

shibing624 commented 1 year ago

百度下gradient_checkpointing

cristianohello commented 1 year ago

@shibing624 你好，我的微调文本最多5000个字符长度，修改哪个参数设置length长度？

CRonaldo1997 commented 1 year ago

@shibing624 感谢！训练、推理一遍过很丝滑就是用所提供的数据集微调后，效果不如微调前，而且输出只有一条，像是被截断了，请大神看看怎么回事呀微调后：

微调前： 2222222

shibing624 commented 1 year ago

max new length 是可以调大的，训练集建议改为alpaca-gpt4的数据，质量高训练效果好。

hellomaxwell commented 1 year ago

@shibing624 你好。

使用你提供的代码运行，虽然看到代码中处理了 enable_input_require_grads

还是有异常报错

 File "/home/llama/huangrong/finetune_from_github.py", line 566, in <module>
    main()
  File "/home/llama/huangrong/finetune_from_github.py", line 509, in main
    model.enable_input_require_grads()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1206, in enable_input_require_grads
    self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads)
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1223, in get_input_embeddings
    return base_model.get_input_embeddings()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1225, in get_input_embeddings
    raise NotImplementedError
NotImplementedError
    main()
  File "/home/llama/huangrong/finetune_from_github.py", line 509, in main
    model.enable_input_require_grads()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1206, in enable_input_require_grads
    self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads)
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1223, in get_input_embeddings
    return base_model.get_input_embeddings()
  File "/home/.conda/envs/3.9.dev/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1225, in get_input_embeddings
    raise NotImplementedError

请问楼主怎么解决此bug，我遇到同样的问题了。

shibing624 commented 1 year ago

2个方法：

拉chatglm2最新的py代码，覆盖本地的；
注释enable_input_require_grads

Forever296 commented 10 months ago

如何修改lm_head呢？

您现在找到修改代码了吗

Forever296 commented 10 months ago

你说的改特殊令牌，lm_head和enable_input_require_grads怎么改啊？？？？？？？？？？

您现在找到修改代码了吗

THUDM / ChatGLM2-6B

实现了chatglm2-6b模型LoRA微调 #51

Additional context

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues