PatWalters / practicalcheminformatics

Apache License 2.0
27 stars 4 forks source link

Assessing Interpretable Models | Practical Cheminformatics #3

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Assessing Interpretable Models | Practical Cheminformatics

Understanding and comparing the rationale behind machine learning model predictions

https://patwalters.github.io/practicalcheminformatics/jupyter/ml/interpretability/2021/06/03/interpretable.html

rflameiro commented 2 years ago

Beautiful! I really enjoy this topic of interpretability of models. Could you comment on the problem (if there is one) of using a 1024-bit fingerprint to train a ML model with "not so many" molecules? I remember reading that your samples:features ratio should be at least 5:1, but it is hard to find 5000 molecules for a lot of specific QSAR tasks. By the way, there seems to be a small formatting problem with the formula after "Matveieva and Polishchuk define a topn score as".

PatWalters commented 2 years ago

A lot of the ideas behind the "5:1 rule" come from linear regression and aren't relevant to modern ML techniques like ensemble methods and neural nets, which have alternate methods for dealing with overfitting.