Open DingYX0731 opened 3 months ago
Thanks for the question. The weights in both adapter_model.bin
and chebi.ckpt
should be used during evaluation. To do this, you can use --init_checkpoint to load weights from chebi.ckpt
and use --peft_dir
to load weights from adapter_model.bin
.
You can refer to the caption evaluation script below:
python stage2.py --devices '[0]' --filename chebi_evaluation --stage2_path "all_checkpoints/share/chebi.ckpt" --opt_model 'facebook/galactica-1.3b' --mode eval --prompt '[START_I_SMILES]{}[END_I_SMILES]. ' --tune_gnn --llm_tune lora --inference_batch_size 8 --root "data/ChEBI-20_data" --peft_dir "all_checkpoints/share/chebi_lora" --init_checkpoint all_checkpoints/share/chebi.ckpt;
In this script, you should replace --peft_dir "all_checkpoints/share/chebi_lora" to the parent folder of adapter_model.bin
Thanks for your reply. I am still confused about how to obtain chebi.ckpt
and adapter_model.bin
if I train the model from scratch? It seems that if I train from scratch, the only checkpoint loaded is last.ckpt
, which I don't know how to use.
Sorry for bothering you again!
Hi! I am trying to reimplement the fine-tune stage of MolCA and running the code:
python stage2.py --root 'data/PubChem324kV2/' --devices '0,1' --filename "ft_pubchem324k" --stage2_path "all_checkpoints/stage2/last.ckpt" --opt_model 'facebook/galactica-1.3b' --max_epochs 100 --mode ft --prompt '[START_I_SMILES]{}[END_I_SMILES]. ' --tune_gnn --llm_tune lora --inference_batch_size 8
The training was conducted fluently but the output checkpoint confused me a lot, since the checkpoint format is different from the shared checkpoint given on huggingface. After training, the only checkpoint saved waslast.ckpt
while the shared checkpoint includes two parts:adapter_model.bin
andchebi.ckpt
.At first, I was thinking the lora weight may also be saved in
last.ckpt
. Indeed, there are weights related to lora saved in the checkpoint but it seems that the weights have not beed fine-tuned??? because they arelora.default.weight
rather thanlora.weight
.Comparing checkpoint
last.ckpt
and sharedchebi.ckpt
:While in shared
adapter_model.bin
, the lora weight is not default:However, when conducting fine-tuning, the validation is working good, but evaluation is not good. I think maybe the checkpoint was not properly saved.
Is there any idea that may solve the checkpoint issue?
Thanks!