Open cmilica opened 4 years ago
I added the two articles from the previous presentation and then this one Would this one work?
One paper we can consider is Supervised classification enables rapid annotation of cell atlases. They use a very simple classifier (multi-class logistic regression) for a topical biological problem of cell type classification. There are many competing methods for this problem, and they have been benchmarked.
Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging is the preprint I mentioned today that discusses confounding in medical imaging. One finding was that
pneumothorax cases without chest drains were highly prevalent (i.e., enriched) in the false negative class
So the presence of the chest drain is incorrectly influencing the predictions. It's a nice cautionary tale, but I'm not certain we want to use too many medical images examples for this biology-centered workshop.
@cmilica we discussed how it was somewhat difficult to find example applications that use decision trees instead of random forests. This paper suggests that at least some new work may be decision tree-based:
PgpRules: a decision tree based prediction server for P-glycoprotein substrates and inhibitors https://doi.org/10.1093/bioinformatics/btz213
I haven't actually read it to see whether it is a good example or confirm they are not ensembling the trees into forests.
I found this paper to be pretty exciting because it is the same field our group is working in. I think most people can understand the need for new classes of antibiotics.
A Deep Learning Approach to Antibiotic Discovery https://doi.org/10.1016/j.cell.2020.01.021
Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub—halicin—that is structurally divergent from conventional antibiotics and displays bactericidal activity against a wide phylogenetic spectrum of pathogens including Mycobacterium tuberculosis and carbapenem-resistant Enterobacteriaceae. Halicin also effectively treated Clostridioides difficile and pan-resistant Acinetobacter baumannii infections in murine models. Additionally, from a discrete set of 23 empirically tested predictions from >107 million molecules curated from the ZINC15 database, our model identified eight antibacterial compounds that are structurally distant from known antibiotics. This work highlights the utility of deep learning approaches to expand our antibiotic arsenal through the discovery of structurally distinct antibacterial molecules.
If you find some really cool, a little bit mainstream ML in Bio example - please post it here!