gbouritsas / GSN

Official repository for the paper "Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting" (TPAMI'22) https://arxiv.org/abs/2006.09252
MIT License
94 stars 20 forks source link

ZINC molecule target #1

Closed thegodone closed 3 years ago

thegodone commented 3 years ago

In your readme, your mention ZINC molecular solubility (LogS) but in your paper LogP ?

What is the correct physical property (LogP or LogS) ?

Is this value is a measured value or a model based values ?

gbouritsas commented 3 years ago

Hi Guillaume,

The predicted property is typical for the ZINC database. Have a look at e.g. the Junction Tree VAE and grammar VAE papers, where they describe it in detail . Basically, it's logP penalised by the Synthetic Accessibility (SA) and the number of large rings. I am not sure if I understand the second part of your question, but I assume that logP and SA are measured, while the number of large rings can be inferred from the graph/structural formula

thegodone commented 3 years ago

Thanks, indeed Synthetic Accessibility is not measure but coming from a model. I would like to be sure LogP is also a measured value and not also computed using RDKit for example.

gbouritsas commented 3 years ago

I am not sure if I can give you the right answer to this. Have a look at this paper as well, maybe you'll find there the answer you are looking for

thegodone commented 3 years ago

Thanks for the reference, so it's confirmed, both SA and logP are both predictions from RDkit.

You are not predicting measured logP but you are prediction another prediction (ie student model). You should mention it clearly in the repo. Measured logP of the similar sized are available. I would encourage you to try your model on real data.

(35) Wildman, S. A.; Crippen, G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868–873. (36) Ertl, P.; Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminf. 2009, 1, 1–11.

gbouritsas commented 3 years ago

Thank you for the references, but please keep in mind that this is a standard benchmark to compare GNN performance (see here) and providing a good metric/evaluator is not the scope of our work. Your concerns about the metric are reasonable, thus if you believe that this standardized procedure is lacking in some respect, I would suggest also raising your objections with the authors of the papers mentioned above.

thegodone commented 3 years ago

I would appreciate if you can clarify the "Synthetic model" in your Github at least, I think this will help others to be caution when they compare one to another Architecture. In all the given papers you attached so far, the goal was not to make the prediction of LogP is to use a pretrained logP model as a condition for molecule generation. For all those papers, they use a correct dataset for comparison cause they don't measure they robustness on LogP prediction. In your case, this is completely different, you use a dataset with a synthetic target and you benchmark your score to other architecture using this synthetic target goal. I don't think science will learn anything about prediction of properties if it's not on a real task.

gbouritsas commented 3 years ago

I don't have any objection to what you are saying, this will be clarified in the repo. But I would like to stress again that the property prediction task is defined in the benchmarking GNNs paper which has been widely used in the community for the past year in order to do comparative studies for graph regression problems. There is a plethora of architectures that are evaluated on this target, including our baselines in the paper, and in my opinion, it is a useful curated dataset that has helped the GNN community understand the differences in the expressive power and the generalisation capabilities of various architectures. I understand that it might not be ideal for the chemoinformatics community, but this is a different story. In any case, I think the authors of the paper that introduced the benchmark are the right point of contact for this. By the way, this discussion is more scientific and goes beyond a github repo issue, thus feel free to email me to discuss this more in-depth if you like.