Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 421 forks source link

finetune的 MAX_STEPS = None 意义是什么?可以改成其他吗? #24

Closed ZenXir closed 1 year ago

ZenXir commented 1 year ago

这里的 MAX_STEPS = None 为什么要设置成None?可以改成其他吗?

if not args.wandb:
    os.environ["WANDB_MODE"] = "disable"
# optimized for RTX 4090. for larger GPUs, increase some of these?
MICRO_BATCH_SIZE = 4  # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
MAX_STEPS = None
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3  # we don't always need 3 tbh
LEARNING_RATE = 3e-4  # the Karpathy constant
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
TARGET_MODULES = [
    "q_proj",
    "v_proj",
]
ZenXir commented 1 year ago

是这样的大佬老师 我使用合并的model作为 base model 来 finetune, 提示这个错误 关于 MAX_STEPS 设置为None的原因

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.04s/it]
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 473.18it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 42.30it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 45.51it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876
Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 228, in <module>
    trainer = transformers.Trainer(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 543, in __init__
    if args.max_steps > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
(Chinese-alpaca-lora) root@DESKTOP-6KDJTBC:/mnt/e/Chinese-Vicuna#
Facico commented 1 year ago

@ZenXir max_step会在代码下面改。这个问题我昨天在本地branch改了忘push上来了,你可以更新一下。

ZenXir commented 1 year ago

好的大佬老师

ZenXir commented 1 year ago

大佬老师 我使用合并的model 使用finetune.py 训练 试了多次 一直报错

模型合并过程和流程分两步: 1、是先按照 https://github.com/ymcui/Chinese-LLaMA-Alpaca 给出的embedding过的model 合并出 pth模型 2、把 1 合并出的pth模型,再通过 stransformer 转换成 huggingface 格式: python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /mnt/e/Chinese-LLaMA-Alpaca/model --model_size 7B --output_dir /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

finetune命令是: python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

报错内容是这个:

CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:21<00:00, 10.80s/it]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.06it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876

 If there's a warning about missing keys above, please disregard :)
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                                             | 0/16260 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 271, in <module>
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2649, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2681, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/peft-0.3.0.dev0-py3.9.egg/peft/peft_model.py", line 529, in forward
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/accelerate-0.17.1-py3.9.egg/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/models/llama/modeling_llama.py", line 786, in forward
    loss = loss_fct(shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1))
RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080
Facico commented 1 year ago

@ZenXir 我还没跑过他们的,你先自己研究一下吧。你这个情况就是没成功转过来。

RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080,llama的词表就是32000左右,这个仓库词表好像是49954这么多吧(不知道后续有没有更新)。如果我猜的没错的话,应该是要加上这一段东西model.resize_token_embeddings(len(tokenizer)) 来更新model内部的embedding维度,你可以试试

ZenXir commented 1 year ago

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬 我让机器跑两天 看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为:", vocab_size)
model.resize_token_embeddings(vocab_size)
ZenXir commented 1 year ago

@Facico 对了大佬老师 用合并了 embedding model的模型finetune 我使用的命令是: python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

其他参数都是默认的,我的机器是单卡 RTX4090 24G 在影响训练效果,和速度方面 有什么建议调整的参数不? 像 bath_size , test_size, epoch 什么的 尤其效果方面的 到时候可以更直观的对比

Facico commented 1 year ago

抱歉消息太多了有些消息会看漏,如果要直观的对比的话,保持batch size和epoch就可以了,如果想要跑快一点可以将mirco batch size调大

molyswu commented 1 year ago

双卡,RTX3090:

if not args.wandb: 37 os.environ["WANDB_MODE"] = "disable" 38 # optimized for RTX 4090. for larger GPUs, increase some of these? 39 MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2 40 BATCH_SIZE = 128 41 MAX_STEPS = None 42 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE 43 EPOCHS = 3 # we don't always need 3 tbh 44 LEARNING_RATE = 3e-4 # the Karpathy constant 45 CUTOFF_LEN = 256 # 256 accounts for about 96% of the data 46 LORA_R = 8 47 LORA_ALPHA = 16 48 LORA_DROPOUT = 0.05 49 VAL_SET_SIZE = args.test_size #2000 50 TARGET_MODULES = [ 51 "q_proj", 52 "v_proj", 53 ]

molyswu commented 1 year ago

/root/anaconda3/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( 0%| | 0/32481 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ ./Chinese-Vicuna/finetune.py:271 in │ │ │ │ 268 │ │ 269 print("\n If there's a warning about missing keys above, please disregard :)") │ │ 270 │ │ ❱ 271 trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) │ │ 272 │ │ 273 model.save_pretrained(OUTPUT_DIR) │ │ 274 │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1662 in train │ │ │ │ 1659 │ │ inner_training_loop = find_executable_batch_size( │ │ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │ │ 1661 │ │ ) │ │ ❱ 1662 │ │ return inner_training_loop( │ │ 1663 │ │ │ args=args, │ │ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │ │ 1665 │ │ │ trial=trial, │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1929 in _inner_training_loop │ │ │ │ 1926 │ │ │ │ │ with model.no_sync(): │ │ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1928 │ │ │ │ else: │ │ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1930 │ │ │ │ │ │ 1931 │ │ │ │ if ( │ │ 1932 │ │ │ │ │ args.logging_nan_inf_filter │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2699 in training_step │ │ │ │ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │ │ 2697 │ │ │ │ 2698 │ │ with self.compute_loss_context_manager(): │ │ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │ │ 2700 │ │ │ │ 2701 │ │ if self.args.n_gpu > 1: │ │ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2731 in compute_loss │ │ │ │ 2728 │ │ │ labels = inputs.pop("labels") │ │ 2729 │ │ else: │ │ 2730 │ │ │ labels = None │ │ ❱ 2731 │ │ outputs = model(inputs) │ │ 2732 │ │ # Save past state if it exists │ │ 2733 │ │ # TODO: this needs to be fixed and made cleaner later. │ │ 2734 │ │ if self.args.past_index >= 0: │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │ │ │ │ 1099 │ │ # this function, and just call forward. │ │ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1102 │ │ │ return forward_call(*input, *kwargs) │ │ 1103 │ │ # Do not call functions when jit is used │ │ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ in forward:663 │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │ │ │ │ 1099 │ │ # this function, and just call forward. │ │ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1102 │ │ │ return forward_call(input, kwargs) │ │ 1103 │ │ # Do not call functions when jit is used │ │ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, *kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /root/anaconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:709 in │ │ forward │ │ │ │ 706 │ │ │ shift_labels = labels[..., 1:].contiguous() │ │ 707 │ │ │ # Flatten the tokens │ │ 708 │ │ │ loss_fct = CrossEntropyLoss() │ │ ❱ 709 │ │ │ shift_logits = shift_logits.view(-1, self.config.vocab_size) │ │ 710 │ │ │ shift_labels = shift_labels.view(-1) │ │ 711 │ │ │ # Enable model parallelism │ │ 712 │ │ │ shift_labels = shift_labels.to(shift_logits.device) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 0%| | 0/32481 [00:04<?, ?it/s]

godzeo commented 1 year ago

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬 我让机器跑两天 看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为:", vocab_size)
model.resize_token_embeddings(vocab_size)

大佬三句代码是加在哪一步的哪个文件里面呢?我也想做同样的训练,奈何我太菜了,没明白

Facico commented 1 year ago

@godzeo 放在加载完模型和tokenizer后就行

abbhay commented 1 year ago

好的大佬老师

老哥 这个max_step 怎么填哇