Closed xx-Jiangwen closed 9 months ago
通过SFT去finetune Qwen14B,首先在自己的数据集上出现LOSS震荡不收敛情况,通过官方finetune脚本则出现收敛情况。 尝试1:将超参数设置为LLaMA factory 默认的超参数(最开始实验时,将超参数设置为Qwen默认超参数,希望一致),但Loss依然震荡 尝试2:怀疑自己数据集原因,尝试使用LLaMA 提供的数据集:alpaca_gpt4_zh,震荡现象依然出现 灰色loss曲线为自己的数据集,紫色为LLaMA提供数据集
1.希望通过SFT finetune qwen14b loss能下降,理论上不是应该和官方脚本结果大概一致吗?
deepspeed --num_gpus 4 --master_port=9901 src/train_bash.py \ --deepspeed /home/ftpai/code/LLaMA-Factory/ds_config_2.json \ --stage sft \ --do_train \ --model_name_or_path /webtt/weight/huggingface/Qwen-14B-4bits \ --dataset alpaca_gpt4_zh \ --template qwen \ --finetuning_type lora \ --lora_target c_attn \ --output_dir save/Qwen-14B-Chat-int4/lora \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 1 \ --save_steps 800 \ --learning_rate 3e-5 \ --num_train_epochs 7 \ --plot_loss \ --cutoff_len 4096 \ --fp16 \ --report_to tensorboard \ --overwrite_output_dir \ --quantization_bit 4
{ "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "none", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true },
"gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 100, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }
- OS: - Python:3.11.5 - Transformers:4.36.2 - PyTorch:2.0.0+cu118 - CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.8
No response
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
通过SFT去finetune Qwen14B,首先在自己的数据集上出现LOSS震荡不收敛情况,通过官方finetune脚本则出现收敛情况。 尝试1:将超参数设置为LLaMA factory 默认的超参数(最开始实验时,将超参数设置为Qwen默认超参数,希望一致),但Loss依然震荡 尝试2:怀疑自己数据集原因,尝试使用LLaMA 提供的数据集:alpaca_gpt4_zh,震荡现象依然出现 灰色loss曲线为自己的数据集,紫色为LLaMA提供数据集
期望行为 | Expected Behavior
1.希望通过SFT finetune qwen14b loss能下降,理论上不是应该和官方脚本结果大概一致吗?
复现方法 | Steps To Reproduce
deepspeed --num_gpus 4 --master_port=9901 src/train_bash.py \ --deepspeed /home/ftpai/code/LLaMA-Factory/ds_config_2.json \ --stage sft \ --do_train \ --model_name_or_path /webtt/weight/huggingface/Qwen-14B-4bits \ --dataset alpaca_gpt4_zh \ --template qwen \ --finetuning_type lora \ --lora_target c_attn \ --output_dir save/Qwen-14B-Chat-int4/lora \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 1 \ --save_steps 800 \ --learning_rate 3e-5 \ --num_train_epochs 7 \ --plot_loss \ --cutoff_len 4096 \ --fp16 \ --report_to tensorboard \ --overwrite_output_dir \ --quantization_bit 4
{ "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "none", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true },
"gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 100, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }
运行环境 | Environment
备注 | Anything else?
No response