Open vladimirkovacevic opened 2 years ago
Hi! You don't need to train the GIN first, since the InfoGraph itself defines a pretraining task. We wrap it with as a 'model' instead of a 'task' in TorchDrug to facilitate the interaction with other layers.
Thank you for the answer. I assumed that, but how exactly are "pretrained" weights obtained since "pretrain" parameter is passed only to loading of the dataset and not to the model?
dataset = datasets.ClinTox("~/molecule-datasets/", node_feature="pretrain", edge_feature="pretrain")
"pretrain" argument results in invoking features.atom.pretrain
R function for calculating molecular node features in molecule.py
.
Hi! The arguments in the dataset refers to chemical features (e.g. atom number, formal charge), rather than anything computed by a neural network. pretrain
means a specific combination of chemical features that is suggested for pretraining graph neural networks.
You may use other chemical features specifier, such as default
, for pretraining. Note you need to remain the same feature specifier for training and test, otherwise the model can't recognize the input correctly.
Hi! I am still confused about this pretrain
argument. The atom representation is fixed if I use default
chemical features specifier, then what's the meaning of pretrain?
@KiddoZhu, sorry, your last response does not address my question. So, in the Pretrained Molecular Representations example, when GIN is instantiated it has random weights, right? As such, it is passed to the InfoGraph. Setting _nodefeature="pretrain" to dataset object does not set weights for GIN. This does not seem to me like desired behavior. Can you please confirm or correct me if I'm wrong? Thanks!
node_feature
has nothing to do with the weights of the network. It only defines the attribute graph.node_feature
for every graph in that dataset, which will be used as the input to the network.
For example, the default
node feature is a concatenation of several chemical properties, like the one-hot encoding of atom type, the mass of the atom, the formal charge of the atom, etc. For pretraining, the pretrain
node feature exactly follows the original paper, but you may also try other features. No matter which node feature you use, you need to stick to the same feature during finetuning. Otherwise, the shape of the input mismatches the network.
In the Pretrained Molecular Representations tutorial GIN model was passed to InfoGraph:
model = models.InfoGraph(gin_model, separate_model=False)
Should GIN be trained first, then passed to InfoGraph?