Question regarding molecule property scores

orgw commented 10 months ago

Hi, thanks for the nice paper+code. according to your paper, I think you missed a lot of SOTA papers regarding molecule properties when comparing. I can see that you're comparing pretrained methods with GIN. However, if you refer to the latest paper for instance https://github.com/HIM-AIM/BatmanNet, scores are way more higher than that in your papers.

for instance BBBP is 0.946 for batmannet and pretrained SMILES-BERT goes over 0.959 for AUC-ROC while the data in yours indicate near 0.8 maximum

Maybe i misunderstood how you compared the models, can you help me understand why there is such a huge gap between the scores?? It is clear that for geometric tasks such as QM9, 2D has low performance. But It's hard for me to understand how 2D + 3D considered representation has lower score compared to only 2D in predicting molecular property. maybe it is due to the dataset size(PCQM4Mv2)?

chao1224 commented 10 months ago

Hi @orgw,

Thank you for raising this question.

It's because we are comparing the pretraining modeling, while the paper you referred to:

It is a mixture of the pretraining and backbone representation modeling.
For the pretraining modeling in this paper you referred, it is using node & edge masking (which is AttrMask in our result tables).

Further, in our paper, we highlighted that our proposed pretraining model (MoleculeSDE) is agnostic to the backbone model.

Hope this answers your question.

orgw commented 10 months ago

Thank you for the clarification!

chao1224 / MoleculeSDE

Question regarding molecule property scores #2