Open traviswheeler opened 1 year ago
@traviswheeler and authors,
Thank you for your work on drug-sniffer.
It is unfortunate that you came up with such results. If possible, I would like to ask more details about the failure modes you found in your pipeline. I am mostly interested in the fingerprints / structure similarity search failure modes. Can you provide more details about it ? I am planning to apply a similar step in my research and would like to learn from your experience.
From my understanding, using DL models to generate novel molecules can create ligand with high docking power, but usually they are hard or impossible to synthesize. So using structure similarity search, on Enamine for example, could help provide synthesizable molecules while keeping some novel part of the DL generated compound. Of course, there are additional post-processing and additional checks to be done after that.
Recently, this technique seemed to have some success in the CACHE Challenge: https://www.biorxiv.org/content/10.1101/2024.07.18.603797v1. Making your feedback on the limitation even more valuable.
Best regards
What you've described (use a DL method to produce some exciting ligand, then use fingerprints to identify synthesizable molecules with similar binding potential, then apply some post-processing, e.g. docking) was exactly the guiding principle behind drugsniffer. It seemed like a great idea.
The problem: fingerprints are not actually good at identifying small molecules with similar binding potential (except for trivial modifications to a base scaffold). We wrote about it in our paper Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No). Since drugsniffer is heavily dependent on the efficacy of fingerprints in enriching for similar expected binding activity, this failure of field-standard methods is a killer. (I suppose it's worth noting that we had some trouble publishing that paper. The common critique from rejecting reviewers was that "everybody knows fingerprints work, so this must be wrong". This despite a paper full of evidence. Ah well.)
But maybe that's just a problem docking works better? Without digging deeply into our explorations, I'll share a picture.
This shows the results of applying Autodock Vina to the DUD-E dataset. It shows that Vina scores (x-axis) do not show any correlation with the experimentally-determined affinity values (y-axis) for active molecules the DUD-E dataset. We have similar data showing that Vina (and others) are only slightly better than chance at distinguishing active molecules from inactives for the large majority of binding pockets in DUD-E and LIT-PCBA benchmarks.
I don't claim that the problem is unsolvable ... just that the idea of stitching fingerprints and docking together (which seemed like a good idea based on the literature) doesn't seem like the right approach. I'm not sure I want to go through the effort of writing (and trying to publish) another negative paper with a title like "you know that docking thing you do? maybe don't trust it" ... so we'll probably just wait to include those results along with more positive results from alternative strategies.
As written above, we've been working on a distinct deep learning strategy for binding inference. We've got quite good results, but this github repo is not the place to describe that (we're pulling a manuscript together). As I understand the CACHE results, they seem to have also landed on an understanding that the fingerprints+docking approach is insufficient.
Thank you for your detailed answer, very informative.
I read your paper and really liked it, I see the limitation of the techniques. But I still think it might be worth a shot when we are dealing with non-synthesizable molecule. I can always check the similar molecule with various simulation, and it's a cheap way to actually make use of the DL generated molecules. I believe it's okay even if I have non actives as well. But I understand, and you show clear data, that it would not work for pure VS on a large database.
About docking, I am surprised with your result, I know docking is far from perfect, but it does carry a light signal for differencing actives and non actives. Let me share my own experiment result, here with Vina-GPU when trying to check for the quality of docking for screening on the DUD-E dataset. We can see that in terms of screening power, it does better than random (most AUC above 0.5). And to be honest, it's not great, but it still carries value.
Anyway, thanks for the discussion, I am looking forward to reading your next manuscript.
Thanks for sharing those results. As I read it, you're seeing an avg AUC of something like 0.65. That's roughly inline with what we saw in early analysis with drugsniffer (Fig 3 from that paper). I suppose the way I think about this is:
Your plots don't really give insight into early enrichment, but the the AUC is suggestive. An AUC of 0.65 means that, if I'm ranking all molecules in hopes that I'll give all the best scores to active molecules, I'm going to do better than a random number generator would do ... but not wildly better. Suppose I had 1 million molecules in a database, and 10 are active while the rest are not: if I pick the top 1000 scoring molecules, it won't be at all surprising if zero of them are actives. Now consider a database of 1 billion molecules. You're basically stuck going through millions of candidates in some other way (or throwing away almost all active molecules. So that lack of enrichment isn't all that helpful in a screening exercise. You really want something that's giving at least an order of magnitude enrichment (putting at least 90% of the inactives at the bottom of the rankings, so the AUC will be 0.90 or better. 0.99 AUC would be even better (and still not ALL that helpful for high quality filtering)
Anyway, that's where I'm coming from in this hunt for better methods. Happy to have others tackling it, too. Good luck!
Thanks for your interest in drugsniffer. Please feel free to use and extend drugsniffer as is useful for you, but be aware that development and maintenance are paused -- we are currently not able to address support requests.
This status is in place for 3 reasons: (1) After a great deal of experimentation, we no longer believe that this (or any similar) pipeline produces particularly enriched predictions for billion-scale search. Each stage, from the de novo generation, through fingerprints and docking suffers from wide ranging failure modes that will not be resolved by minor tweaking. (2) The lead developer on the project has departed, so minor tweaking is not trivial. Since we don't think it's useful, it's not worth the effort (to us or you) (3) We are actively working on improved methods for pocket identification, rapid docking-free binding potential prediction, and better docking approaches. When these are complete, we'll pull them into drugsniffer, then bring the package back up to speed.