choderalab / fah-xchem-notebooks

Collection of notebooks for the analysis of COVID moonshot sprints
0 stars 0 forks source link

A few comments on the notebooks #1

Open hannahbrucemacdonald opened 4 years ago

hannahbrucemacdonald commented 4 years ago

This all looks great! Will definitely be helpful for improving BAR overlap.

image

So this is forwards vs. backwards for a particular series? It's interesting that backwards seems to give a consistently smaller free energy differences than forwards for the x10876 series, but that's not the case for x10871. Is there something inherently different between these reference ligands that might explain this?

It would be helpful to see the BAR overlap of the forwards vs backwards, as this could maybe indicate if one direction is more reliable (I would bet backwards, if backwards is larger -> smaller). The BAR overlap should be the same for both directions. The RMSE of the BAR here will provide a rough estimate of uncertainty for the BAR overlap.

All of your plots of BAR overlap against cheminformatics stuff is great. It looks to me that this is negatively correlated, but I'm not sure if it's actuallly the same distribution but with fewer samples towards delta Heavy atoms. Showing this as a box plot or something where the mean can be seen image or even just a line of best fit would help tell what's going on.

Same with this: image

Where the data-points are so clustered because the integer x-axis, it's too hard to see if there's any signal in the noise! Line of best fit or a box plot would help!

The final thing, it would be interesting to see the free energy residuals np.abs(forwards binding - reverse binding) against the same factors you've plotted for BAR overlap would be interesting. Is there a certain size of change where agreement between forwards and backwards stop agreeing (and then the estimates become unreliable).

hannahbrucemacdonald commented 4 years ago

I'm surprised that the correlation between BAR overlap and number of unique atoms isn't better, but maybe it agrees with what we've seen before that small subtleties have a big difference in the efficiency.

There is so much data we could make a ML model and learn the alchemical change vs BAR overlap or BAR uncertainty. Is there a way to pluck out all the transformations that are X to Y, where X might be H and Y is methyl? Maybe we could generate a reaction smirks for each transformation and try cluster them, and see if the BAR uncertainty is the same for each cluster

jchodera commented 4 years ago

There is so much data we could make a ML model and learn the alchemical change vs BAR overlap or BAR uncertainty.

We can loop Gavin Crooks in on this. We're onboarding him right now. He's in slack.

glass-w commented 4 years ago

Only just managed to catch up with all of this, thanks so much for the feedback. I will spend some time looking over all of the above!

glass-w commented 3 years ago

@hannahbrucemacdonald I've tidied these up a little now so it's a clearer. Here are the plots for both backwards and forwards transformations with change in number of heavy atoms:

It looks like there is a negative correlation for both transformations.

With the number of unique atoms, I've split this up between the two reference ligands (ALP-POS-d2866bdf-1 and ALP-POS-c59291d4-2:

I'm not sure what to make of the correlation here with the number of unique atoms, to my eye it looks slightly periodical (at least for ALP-POS-c59291d4-2). Next steps are looking at the residuals as you suggested and I'm currently working on investigating the number of rotatable bonds (see notebook).

Any thoughts or suggestions are welcome, thanks for the feedback already!