LianjiaTech / BELLE

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
Apache License 2.0
7.96k stars 761 forks source link

训练loss,正常,推理的时候直接梯度爆炸了额 #336

Closed tomtang110 closed 1 year ago

tomtang110 commented 1 year ago

image

训练参数如下

deepspeed --num_gpus 1 main.py \ --sft_only_data_path law_total_conv.json \ --model_name_or_path bigscience/bloomz-560m \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 2 \ --max_seq_len 512 \ --learning_rate 5e-6 \ --weight_decay 0.0001 \ --num_train_epochs 2 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --num_warmup_steps 100 \ --seed 1234 \ --gradient_checkpointing \ --zero_stage $ZERO_STAGE \ --deepspeed \ --output_dir $OUTPUT \ --data_output_path $data_output_path \ --cache_dir ./cache_dir

&> $OUTPUT/training.log

推理代码如下 image

xianghuisun commented 1 year ago

bigscience/bloomz-560m 在load的时候torch_dtype设置为torch.float32是正常的 如果是torch.float16会遇到这个问题

xianghuisun commented 1 year ago

image

训练参数如下

deepspeed --num_gpus 1 main.py --sft_only_data_path law_total_conv.json --model_name_or_path bigscience/bloomz-560m --per_device_train_batch_size 8 --per_device_eval_batch_size 2 --max_seq_len 512 --learning_rate 5e-6 --weight_decay 0.0001 --num_train_epochs 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --num_warmup_steps 100 --seed 1234 --gradient_checkpointing --zero_stage $ZERO_STAGE --deepspeed --output_dir $OUTPUT --data_output_path $data_output_path --cache_dir ./cache_dir

&> $OUTPUT/training.log

推理代码如下 image

请问您还遇到这个问题吗?如果torch_dtype改为torch.float32