THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.49k stars 5.2k forks source link

ptuning微调时,loss下降很慢怎么办? #1141

Open niexufei opened 1 year ago

niexufei commented 1 year ago

Is there an existing issue for this?

Current Behavior

使用ptuning下的代码,和AdvertiseGen的训练数据,参数设置如下: PRE_SEQ_LEN=32 LR=1e-2

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \ --do_train \ --train_file AdvertiseGen/train.json \ --validation_file AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /home/chatGPT/model/chatGLMModel/chatGLMHuggingFace/chatglm-6b \ --output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR \ --overwrite_output_dir \ --max_source_length 64 \ --max_target_length 64 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 16 \ --predict_with_generate \ --max_steps 6000 \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate $LR \ --pre_seq_len $PRE_SEQ_LEN \ --quantization_bit 4 >train.log 2>&1 &

训练完6000步之后,结果如下: { "epoch": 1.68, "train_loss": 4.087045756022135, "train_runtime": 389852.3508, "train_samples": 114599, "train_samples_per_second": 0.492, "train_steps_per_second": 0.015 } loss很大,使用web_demo时,回答的问题都不正常了。

Expected Behavior

精调完成之后,loss下降到一个合适的值,精调后的模型能够回答问题;

Steps To Reproduce

执行./train.sh,具体参数见上面描述

Environment

- OS:centos
- Python:3.8
- Transformers:4.28.0
- PyTorch:1.13.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

terminator123 commented 1 year ago

请问你解决问题了吗

zzy347964399 commented 1 year ago

batch再大一点??

zzy347964399 commented 1 year ago

建议tensorboard里面看一下图像是什么情况