awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
696 stars 144 forks source link

Question: what datasets were pre-trained models pre-trained on? #199

Open rhjohnstone opened 1 year ago

rhjohnstone commented 1 year ago

Some of the pre-trained models are just described as "pre-trained", while others are described as "pre-trained then fine-tuned on x". What data was the original pre-trained performed on, and for how long?

e.g. from the docs:

'gin_supervised_contextpred': A GIN model pre-trained with supervised learning and context prediction 'gin_supervised_masking_BACE': A GIN model pre-trained with supervised learning and masking, and fine-tuned on BACE

mufeili commented 1 year ago

You may find the details of pre-training in https://arxiv.org/abs/1905.12265. supervised means supervised pre-training on a ChEMBL dataset was performed. contextpred means self-supervised pre-training with context prediction on a ZINC15 dataset was performed.