Closed pengzhangzhi closed 4 days ago
hello @pengzhangzhi ,
could you provide the generation results and the way you load the checkpoint?
By the way, if you use the config yaml in config/experiment/lm
and continue train from the pretrained weight, the learning rate is large, which may result in the large change of pretrained weight and lead to a bad performance. So if you want to continue train, the learning rate starts from the ending rate, i.e., 1e-5, may be better.
Hi @wxy-nlp ,
Thanks!! I load the ckpt from the path:
c=dplm/byprot-checkpoints/dplm_150m_finetune_lr_1e-8/checkpoints/last.ckpt
python generate.py --model_name "airkingbd/${model_name}" --seq_lens 100 --saveto ${output_dir} --num_seqs 100
I tried to set a smaller LR even 1e-8, but the fine-tuning would gradually degrade the pLDDT. Below is the comparison between the base DPLM-150M and the fine-tuned DPLM-150M with LR 1e-8:
pLDDT:
Base
69.44743
finetune
66.5991
If I use LR 1e-5 or something larger than 1e-8, the generation is completely broken... : ( If you want to verify, you can simply set the LR to be 1e-5, load the ckpt, and fine-tune the mode for a couple thousand steps.
Also, could you please share the configs for DPLM-150M with us? I remember in the paper, you employ a two-stage training, I wonder the hyper-params for the two stages and the training steps. Would love to reproduce your training.
Hi,
I simply loaded the pretrain weight and fine-tuned it on the same dataset, and got the ckpt that generates more repetitive sequences then I thought. This is quite bizarre to me. Is there something wrong with the current training code or the released ckpts are too good?
cc @zhengzx-nlp @wxy-nlp @leiyu-bytedance @lark