awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
696 stars 144 forks source link

How to use pre-trained models on custom datasets #209

Open GattiMh opened 1 year ago

GattiMh commented 1 year ago

Hello!

Many thanks for putting up this code.

I apologise if this is some silly question but I'm new to this field and eager to dive into it. For the time being, I'm trying to load a pre-trained model and generate predictions for a user defined sets of molecules. I have found the commands to load a model, which should be something like this:

dataset = Tox21(smiles_to_bigraph, CanonicalAtomFeaturizer()) model = load_pretrained('GCN_Tox21') # Pretrained model loaded model.eval()

What are the commands to generate prediction on user-defined Smiles using the pre-trained model?

Many thanks

mufeili commented 1 year ago

You might find it helpful to follow the example here.

GattiMh commented 1 year ago

Thank you.

In the Classification_inference.py, there's the -tp as a flag for the train_results_path. So I guess I should use that to load a pre-trained model? It seems it will look for a configure.json which comes as a result of training after classification/regression_train.py.

Apologise for still been confused.

mufeili commented 1 year ago

Yes. The original classification_inference.py is to be used after you use classification_train.py. You may adapt the file and modify it for your own purpose.

GattiMh commented 1 year ago

Thanks.

But if you want to use pre-trained models then why would you need to use the train.py? I thought you could have used the inference.py just by simply pointing to the folder that contains the pre-trained models. Am I missing something here?

mufeili commented 1 year ago

There can be different kinds of pre-trained models. Some models are pre-trained on a particular dataset for a particular supervised learning task. In that scenario, inference.py is designed to use a model pre-trained by using train.py.

There are also models pre-trained on a broad range of datasets in an unsupervised/self-supervised fashion for representation learning. To use this kind of pre-trained models for a particular dataset, it typically requires fine-tuning the pre-trained model on the dataset like what's done in train.py.