deepchem / moleculenet

Moleculenet.ai Datasets And Splits
MIT License
88 stars 19 forks source link

Discrepancy in BBBP Numbers? #32

Closed rbharath closed 3 years ago

rbharath commented 3 years ago

Comparing the BBBP numbers https://github.com/deepchem/moleculenet#bbbp with the numbers in the original paper https://arxiv.org/pdf/1703.00564.pdf, it looks like we have a pretty big difference in numbers. The BBBP test numbers in the original Arxiv paper are around 0.7ish while the new numbers are around 0.9 for random forests, suggesting there might be some issue.

CC @mufeili @seyonechithrananda @yuanqidu

rbharath commented 3 years ago

As an interesting counterpoint, the Grover paper has numbers closer to 0.9ish. Is there a difference in split perhaps? https://arxiv.org/pdf/2007.02835.pdf

mufeili commented 3 years ago

Did you search the hyperparameters for random forest when publishing the original paper?

rbharath commented 3 years ago

We did do hyperparameter search IIRC so I suspect the models are comparable. This issue came up with @seyonechithrananda @gabegrand and the other ChemBERTa folks (https://github.com/seyonechithrananda/bert-loves-chemistry) were discussing why our benchmark numbers for BBBP (~.7ish on test for DMPNN and RFs both) had a mismatch compared with the MoleculeNet leaderboard numbers.

@seyonechithrananda @gabegrand Please feel free to add on details if I'm missing anything!

rbharath commented 3 years ago

@miaecle Would you have time to take a quick look? We're trying to figure out what the discrepancy is between our original numbers and the new numbers are for BBBP and I think you ran the original experiments

mufeili commented 3 years ago

I think you can close this issue now. @rbharath