Closed sooheon closed 4 years ago
I see this is true, as the pretrained weights for the generator have 28 outputs, for each atomic feature.
Are there any plans to open source the node masking training code in the future? I'm curious if/how you adapted this to work with the transformer as opposed to gnns.
Hello,
We use MAT weights obtained from the "masked input node" prediction pretraining. During fine-tuning we do not freeze any weights of the network. All layers of MAT are trained.
We are currently working on the various methods of graph pre-training and we plan to release the code when we finish.
Closing for now. Please reopen if you have any other questions.
I've been thinking more about the pretraining methods. I see that node masking is analogous to BERT style Cloze task and is straightforward. I'm having more difficulty understanding how edge masking would work.
After reading the graph pretraining paper, I'm thinking something like:
Actually it seems like if you fully mask the distance matrix, the task is training a molecule conformer.
How are you guys approaching this?
I would like to confirm whether the pretrained weights available in the README are just from "masked input node" prediction, and not of the final trained MAT. I assume this is the case because it skips loading any generator weights (which would differ for each task).
When you do transfer learn onto a specific task, do you do any freezing and gradual thawing of the encoder weights, or just train right away?