Qwen14B-Chat-int4[BUG] <title>

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

通过SFT去finetune Qwen14B，首先在自己的数据集上出现LOSS震荡不收敛情况，通过官方finetune脚本则出现收敛情况。尝试1：将超参数设置为LLaMA factory 默认的超参数（最开始实验时，将超参数设置为Qwen默认超参数，希望一致），但Loss依然震荡尝试2：怀疑自己数据集原因，尝试使用LLaMA 提供的数据集：alpaca_gpt4_zh，震荡现象依然出现灰色loss曲线为自己的数据集，紫色为LLaMA提供数据集 585d7c07c633735750b721daf2cdada024f420f3fa4105b6474169286c2179c9QzpcVXNlcnNcMTA2NjBcQXBwRGF0YVxSb2FtaW5nXERpbmdUYWxrXDMwMDI5NTY3NzhfdjJcSW1hZ2VGaWxlc1wxNzA0NDI3MjY2MjA0XzZFMjEzMjNBLTkwRDgtNGE3Yi05MUI4LTBCQThFREYxQzU0Ni5wbmc= 87978a7440ae24a3b4c8502a5626f33d214e4dfb3c62df1e6174aacaaf7a448fQzpcVXNlcnNcMTA2NjBcQXBwRGF0YVxSb2FtaW5nXERpbmdUYWxrXDMwMDI5NTY3NzhfdjJcSW1hZ2VGaWxlc1wxNzA0NDE4ODQwMDY1X0M2MUEwNEZDLTg4OTYtNDFjMy04NDg3LURGMEUzMjhBQjEwNi5wbmc=

期望行为 | Expected Behavior

1.希望通过SFT finetune qwen14b loss能下降，理论上不是应该和官方脚本结果大概一致吗？

复现方法 | Steps To Reproduce

deepspeed --num_gpus 4 --master_port=9901 src/train_bash.py \ --deepspeed /home/ftpai/code/LLaMA-Factory/ds_config_2.json \ --stage sft \ --do_train \ --model_name_or_path /webtt/weight/huggingface/Qwen-14B-4bits \ --dataset alpaca_gpt4_zh \ --template qwen \ --finetuning_type lora \ --lora_target c_attn \ --output_dir save/Qwen-14B-Chat-int4/lora \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 1 \ --save_steps 800 \ --learning_rate 3e-5 \ --num_train_epochs 7 \ --plot_loss \ --cutoff_len 4096 \ --fp16 \ --report_to tensorboard \ --overwrite_output_dir \ --quantization_bit 4

{ "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "none", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true },

"gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 100, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }

运行环境 | Environment

- OS:
- Python:3.11.5
- Transformers:4.36.2
- PyTorch:2.0.0+cu118
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.8

备注 | Anything else?

No response

QwenLM / Qwen