ardigen / MAT

The official implementation of the Molecule Attention Transformer.
MIT License
234 stars 57 forks source link

Question re: pretrained weights #11

Closed sooheon closed 4 years ago

sooheon commented 4 years ago

I would like to confirm whether the pretrained weights available in the README are just from "masked input node" prediction, and not of the final trained MAT. I assume this is the case because it skips loading any generator weights (which would differ for each task).

When you do transfer learn onto a specific task, do you do any freezing and gradual thawing of the encoder weights, or just train right away?

sooheon commented 4 years ago

I see this is true, as the pretrained weights for the generator have 28 outputs, for each atomic feature.

Are there any plans to open source the node masking training code in the future? I'm curious if/how you adapted this to work with the transformer as opposed to gnns.

Mazzza commented 4 years ago

Hello,

We use MAT weights obtained from the "masked input node" prediction pretraining. During fine-tuning we do not freeze any weights of the network. All layers of MAT are trained.

We are currently working on the various methods of graph pre-training and we plan to release the code when we finish.

Mazzza commented 4 years ago

Closing for now. Please reopen if you have any other questions.

sooheon commented 4 years ago

I've been thinking more about the pretraining methods. I see that node masking is analogous to BERT style Cloze task and is straightforward. I'm having more difficulty understanding how edge masking would work.

After reading the graph pretraining paper, I'm thinking something like:

Actually it seems like if you fully mask the distance matrix, the task is training a molecule conformer.

How are you guys approaching this?