LLaMA Factory + GLM4 微调最佳实践

hiyouga commented 5 months ago

LLaMA Factory 支持了 GLM-4-9B 和 GLM-4-9B-Chat 模型的指令微调、RLHF、DPO 和 SimPO 等优化方法

https://github.com/hiyouga/LLaMA-Factory/blob/main/README_zh.md

指令微调

CUDA_VISIBLE_DEVICES=0,1 HF_ENDPOINT=https://hf-mirror.com llamafactory-cli train sft.yaml

sft.yaml 文件内容：

### model
model_name_or_path: THUDM/glm-4-9b-chat

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: identity,alpaca_en_demo,alpaca_zh_demo
template: glm4
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/glm4-sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

### eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 500

sft

多卡推理

CUDA_VISIBLE_DEVICES=0,1 HF_ENDPOINT=https://hf-mirror.com llamafactory-cli chat \
    --model_name_or_path THUDM/glm-4-9b-chat \
    --adapter_name_or_path saves/glm4-sft \
    --template glm4 \
    --finetuning_type lora

资源使用

LoRA: ~20GB QLoRA: ~10GB 半精度推理：~18GB 4-bit 推理：~7GB

vram

Tendo33 commented 5 months ago

活捉hhhh

zRzRzRzRzRzRzR commented 5 months ago

感谢支持，已经合并！

doctorcui commented 5 months ago

🐮🐮

zmh2000829 commented 5 months ago

lora_target: 到底是q_proj,v_proj还是query_key_value欸

hiyouga commented 5 months ago

@zmh2000829 应该是 query_key_value，或者改成 all

bbruceyuan commented 5 months ago

在使用的时候遇到了这样的问题，是 model download 出错吗？

.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat/08914867436b750c287539795e63c24631273878/modeling_chatglm.py", line 260, in forward                                       
    value_layer = value_layer.view(output_size[0] * output_size[1], value_layer.size(2), -1)                                                                                                                      
RuntimeError: shape '[21248, 664, -1]' is invalid for input of size 5439488

配置见：

### model
model_name_or_path: THUDM/glm-4-9b-chat
# model_name_or_path: ZhipuAI/glm-4-9b-chat

### method
stage: sft
do_train: true
finetuning_type: full

### ddp
ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: yi_train
template: glm4
cutoff_len: 8192
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/glm/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 0.00001
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_steps: 0.1
fp16: true

### eval
val_size: 0.1
per_device_eval_batch_size: 2
evaluation_strategy: steps
eval_steps: 500

# report_to: wandb

run command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 llamafactory-cli train examples/full_multi_gpu/glm.yaml

hiyouga commented 5 months ago

@bbruceyuan 升级 pytorch 到 2.1.0 以上

zRzRzRzRzRzRzR commented 5 months ago

我们已经合并到了首页上的友情链接，感谢支持！本issue关闭啦

nameless0120 commented 5 months ago

微调时报错： b3ae0c3a8bab4deeda3e7df8fd4be5ae

配置如下： `### model model_name_or_path: /mnt/sda/nameless0078/glm-4-9b-chat

method

stage: sft do_train: true finetuning_type: lora lora_target: all

dataset

dataset: identity1,alpaca_en_demo,alpaca_zh_demo template: glm4 cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/glm4-sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true

eval

val_size: 0.1 per_device_eval_batch_size: 1 evaluation_strategy: steps eval_steps: 500`

命令如下： CUDA_VISIBLE_DEVICES=2,3 llamafactory-cli train ./examples/wyf_tests/train/glm4_test.yaml 请问是何问题？

hiyouga commented 5 months ago

@nameless0120 使用最新版代码

oocsoo commented 5 months ago

(base) vipuser@WIN-MK00C8R6PJA:~/LLaMA-Factory$ conda activate fine-tuning (fine-tuning) vipuser@WIN-MK00C8R6PJA:~/LLaMA-Factory$ llamafactory-cli webui Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). gio: http://localhost:7860/: Operation not supported [INFO|tokenization_utils_base.py:2106] 2024-06-09 22:07:57,066 >> loading file qwen.tiktoken [INFO|tokenization_utils_base.py:2106] 2024-06-09 22:07:57,066 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2106] 2024-06-09 22:07:57,066 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2106] 2024-06-09 22:07:57,066 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2106] 2024-06-09 22:07:57,066 >> loading file tokenizer.json 06/09/2024 22:07:57 - INFO - llamafactory.data.template - Add eos token: <|im_end|> 06/09/2024 22:07:57 - INFO - llamafactory.data.template - Add pad token: <|im_end|> [INFO|configuration_utils.py:731] 2024-06-09 22:07:57,274 >> loading configuration file Qwen/Qwen-1_8B-Chat/config.json [INFO|configuration_utils.py:731] 2024-06-09 22:07:57,274 >> loading configuration file Qwen/Qwen-1_8B-Chat/config.json [INFO|configuration_utils.py:796] 2024-06-09 22:07:57,275 >> Model config QWenConfig { "_name_or_path": "Qwen/Qwen-1_8B-Chat", "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "configuration_qwen.QWenConfig", "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": false, "fp32": false, "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 11008, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 8192, "model_type": "qwen", "no_bias": true, "num_attention_heads": 16, "num_hidden_layers": 24, "onnx_safe": null, "rotary_emb_base": 10000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 8192, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "transformers_version": "4.41.2", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": true, "use_flash_attn": "auto", "use_logn_attn": true, "vocab_size": 151936 }

06/09/2024 22:07:57 - INFO - llamafactory.model.patcher - Using KV cache for faster generation. Traceback (most recent call last): File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/queueing.py", line 571, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/blocks.py", line 1923, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/blocks.py", line 1521, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/utils.py", line 656, in async_iteration return await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/utils.py", line 649, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/utils.py", line 632, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/gradio/utils.py", line 815, in gen_wrapper response = next(iterator) ^^^^^^^^^^^^^^ File "/home/vipuser/LLaMA-Factory/src/llamafactory/webui/chatter.py", line 86, in load_model super().init(args) File "/home/vipuser/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 26, in init self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 44, in init self.model = load_model( ^^^^^^^^^^^ File "/home/vipuser/LLaMA-Factory/src/llamafactory/model/loader.py", line 137, in load_model model = AutoModelForCausalLM.from_pretrained(**init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained model_class = get_class_from_dynamic_module( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 501, in get_class_from_dynamic_module final_module = get_cached_module_file( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 326, in get_cached_module_file modules_needed = check_imports(resolved_module_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vipuser/anaconda3/envs/fine-tuning/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 181, in check_imports raise ImportError( ImportError: This modeling file requires the following packages that were not found in your environment: transformers_stream_generator. Run pip install transformers_stream_generator

chat加载模型的时候，报错，大佬能不能给看看什么原因吗 ?

nameless0120 commented 5 months ago

尝试命令：pip install transformers_stream_generator 或pip install transformers==4.41.2

oocsoo commented 5 months ago

QQ截图20240609221840 pip list 反馈我的transformers 就是4.41.2版本。

oocsoo commented 5 months ago

我用的是ubuntu wsl

night666e commented 5 months ago

![image](https://github.com/THUDM/GLM-4/assets/125663783/a70324e4-e404-42e9-8560-177cbc0250d7 屏幕截图 2024-06-12 150503 配置：### model model_name_or_path: /home/dell/models/glm-4-9b

method

stage: sft do_train: true finetuning_type: lora lora_target: query_key_value

dataset

dataset: identity,fine_8_2 template: glm4 cutoff_len: 1024 max_samples: 100000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves_glm4/glm4-9b/lora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 6 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true

eval

val_size: 0.1 per_device_eval_batch_size: 6 eval_strategy: steps eval_steps: 500

sudo CUDA_VISIBLE_DEVICES=1 /home/dell/anaconda3/envs/llama_factory/bin/llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml

THUDM / GLM-4