Getting embeddings from the pretrained model

rbvh commented 9 months ago

Hello,

I'm trying to figure out how to get embeddings from the pretrained model to determine molecule similarity. I found that there is a get_embs in make_predictions.py, but I can't get it to work. I wrote something like this:

import pandas as pd

from chemprop.parsing import parse_train_args, modify_train_args
from chemprop.train import get_embs

args = parse_train_args()
args.checkpoint_path = './dumped/pretrained_graph_encoder/original_CMPN_0623_1350_14000th_epoch.pkl'
modify_train_args(args)

data = pd.read_csv('./data/bbbp.csv')["smiles"].tolist()

embs = get_embs(args, data)

This leads to

RuntimeError: Error(s) in loading state_dict for CMPN:
        Missing key(s) in state_dict: "encoder.cls", "encoder.W_i_atom_new.weight".

I can add strict=False to the load_state_dict call in get_embs, but this leads to

AttributeError: 'Linear' object has no attribute 'prompt_generator'

What should I do to get this to work?

Thanks

hellowangqian commented 8 months ago

I came across the same issue and came up with a workaround by adding a line of code args.step = "something". Check out the logic in this file

Whilst such a workaround can make the code run successfully, I'm not sure if the results are correct in the sense of pre-trained feature extraction. Hope the authors can comment and advise on this issue.

ZJU-Fangyin commented 8 months ago

Hi rbvh and hellowangqian,

I'm sorry for not replying sooner; I've been quite busy recently.

I have addressed this problem by adding a line of code here. Now, when you set args.step to pretrain, the code should run smoothly.

Example:

data = pd.read_csv('./data/bbbp.csv')
emb, smiles = get_embs(args, data.smiles.tolist())

rbvh commented 8 months ago

Thank you, very much appreciated!

HICAI-ZJU / KANO

Getting embeddings from the pretrained model #20