fintuning DPLM results in worse generation

pengzhangzhi commented 1 month ago

Hi,

I simply loaded the pretrain weight and fine-tuned it on the same dataset, and got the ckpt that generates more repetitive sequences then I thought. This is quite bizarre to me. Is there something wrong with the current training code or the released ckpts are too good?

cc @zhengzx-nlp @wxy-nlp @leiyu-bytedance @lark

wxy-nlp commented 1 month ago

hello @pengzhangzhi ,

could you provide the generation results and the way you load the checkpoint?

By the way, if you use the config yaml in config/experiment/lm and continue train from the pretrained weight, the learning rate is large, which may result in the large change of pretrained weight and lead to a bad performance. So if you want to continue train, the learning rate starts from the ending rate, i.e., 1e-5, may be better.

pengzhangzhi commented 1 month ago

Hi @wxy-nlp ,

Thanks!! I load the ckpt from the path:

c=dplm/byprot-checkpoints/dplm_150m_finetune_lr_1e-8/checkpoints/last.ckpt
python generate.py --model_name "airkingbd/${model_name}"         --seq_lens 100 --saveto ${output_dir} --num_seqs 100

I tried to set a smaller LR even 1e-8, but the fine-tuning would gradually degrade the pLDDT. Below is the comparison between the base DPLM-150M and the fine-tuned DPLM-150M with LR 1e-8:

pLDDT:
Base 
 69.44743
finetune
 66.5991

If I use LR 1e-5 or something larger than 1e-8, the generation is completely broken... : ( If you want to verify, you can simply set the LR to be 1e-5, load the ckpt, and fine-tune the mode for a couple thousand steps.

Also, could you please share the configs for DPLM-150M with us? I remember in the paper, you employ a two-stage training, I wonder the hyper-params for the two stages and the training steps. Would love to reproduce your training.

bytedance / dplm

fintuning DPLM results in worse generation #11