awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
727 stars 150 forks source link

PAGTN imeplementation #137

Closed VIGNESHinZONE closed 3 years ago

VIGNESHinZONE commented 3 years ago

I had recently come across this paper PAGTN and code. I was planning on implementing this paper in dgl-lifesci. As per the contribution guidelines, I felt that I should get the repo maintainer's opinion before making a PR.

I have finished reproducing some of the results from this paper on MoleculeNet (ESOL, BACE, BBBP) and can be found on this collab notebook here.

Could you please let me know if adding this model will be fruitful to the dgl-lifesci community?

Here are a few short notes about the model to save time -

1) The model uses a complete graph for the molecule where each node is connected to every other node. For edge feature between any two nodes, they find the shortest path between two nodes as the path along with bonds and build features about it. Ex - The shortest path between node 8 and node 5 is 8 -> 7 -> 6 -> 5 Screenshot from 2021-02-19 08-21-29

2) The node features are a concatenation of one-hot-encoding vectors of Atom_type, formal_charge, valency, etc. 3) The edge features are a concatenation of bond type in the shortest path between two nodes and aromatic ring type.

4) About the neural network- it is very similar to the GAT model but a lot of residual connections and slightly different message-passing strategy which give it a lot of benefits.

Please let me know if I should proceed further working on this model. I will have to clean the existing code, add docs and few optimisations.

mufeili commented 3 years ago

Sorry for the late reply and thank you for your interest in contribution. Are you going to implement it anyway? Does this model perform well in your experiments?

VIGNESHinZONE commented 3 years ago

I was implementing it for the sole purpose of contributing to this organization and then a wrapper function in deepchem. About the results, I tried experimenting on ESOL and BACE using scaffold splitting.

ESOL --> 0.68 RMSE error BACE --> 0.92 auc score

I mainly wanted to know if there were any criteria for deciding if a model has to be implemented in this repo or not- 1) Is it about new novel architecture? Or 2) Is it about benchmarking on a few of the moleculenet datasets or comparable results with GCN / GAT?

mufeili commented 3 years ago

The results sound good. There isn't a criteria and I generally welcome any contributions.

VIGNESHinZONE commented 3 years ago

Thanks, I will start a PR in few days.