Open sonyahanson opened 9 years ago
Can you paste in the Shukla plot for comparison?
Not 100% comparable, not sure exactly why. Perhaps there is an intentional flip in the y axis, but this doesn't explain everything... Could also not be a problem on our side...
I just put the code for the first set of plots in the MSMs repository (scatterplot commented out ATM): https://github.com/choderalab/MSMs/tree/master/plots
If anyone wants to play with it or find any mistakes I might have made.
This may not be so bad. I think making the following changes might help:
Might also be useful to color the points by sequence identity.
May not be so bad in what context? Might help toward what?
The axes are already quite similar [-2,2] and [.3,1] in nanometers for the kdeplots. As far as I can tell the definitions for RMSD and the distance difference are the same, but it's a little hard to verify. Will keep looking into this.
Right now it seems like our equivalent to this long well around -10 Angstroms in the Shukla figure is actually at ~5 Angstroms (0.5 nm) in the ensembler models. Again, not sure if this is because of something different in the way the plots were made or because of a legitimate difference in what the structures say vs. what the shukla simulations say.
Also @maxentile helped a bit with this, so maybe he'd be interested/ have input.
May not be so bad in what context? Might help toward what?
You had said:
Not 100% comparable, not sure exactly why. Perhaps there is an intentional flip in the y axis, but this doesn't explain everything... Could also not be a problem on our side...
Which suggests you suspected some significant discrepancies between your plots and the Shukla plot. I remarked that the difference between the two may not be so bad, but it was hard to compare the figures because of the differences in choice of axis ranges and units.
Or at least my brain has trouble comparing them due to the differences in axis ranges and units...
The only other big difference I've noticed is that, in Figure 2, the points are weighted by the free energies of the corresponding MSM states (see caption), which might complicate comparison to an unweighted density plot.
Definitely a good point @maxentile .
The only other big difference I've noticed is that, in Figure 2, the points are weighted by the free energies of the corresponding MSM states (see caption), which might complicate comparison to an unweighted density plot.
Of course. We're just hoping that our Ensembler seeded models cover an area that is a superset of the support in the Shukla figure.
Slightly modified plots for John's brain: Src Abl
If we really wanted to understand this better, should take a look at the PDB endpoints from the Shukla paper, and use those to figure out if there is something these plots have got wrong/different.
Not sure if continuing down this route (e.g. trying to replicate Shukla data) rather than figuring out our own version of interesting coordinates (e.g. DFG flip dihedral and something else) wouldn't be more useful.
Also, I find it sorta interesting that the Src and Abl plots do look slightly different, despite being made from the same set of structures.
Also, I find it sorta interesting that the Src and Abl plots do look slightly different, despite being made from the same set of structures.
This is probably just a typo in the axis labeling, but the labels for both Src and Abl are the same residue numbers. Wouldn't the residue numbers be different for the corresponding E, R, and K residues in Src and Abl?
Will triple check that the residues are right, but indeed the residue numbers used to calculate this are different in src vs. abl. The axis label does not come from the actual residues used since the pdb numbering is meaningless.
Updated Abl plot with correct Abl uniprot residue numbers:
Hmmm... At least broadly it looks like these coordinates should be the same as the Shukla paper. Plotting 2SRC (which they used as their inactive) and 1Y57 (which the used as their active) onto our kdeplots (and now overlayed scatterplot) of the ensembler models, we get the same L-shape that they do.
I wonder why we are completely missing stuff near the 2SRC point.
One issue could be that the volume of configuration space gets very tiny near 0 RMSD to any reference structure. It could be that anything we do at all---modeling, implicit solvent relaxation--will quickly move it a few Angstroms RMSD away.
When you are comparing to 2SRC, do you use 2SRC itself, or the 2SRC-derived model? I would imagine there has to be a cluster of models near that 2SRC model...
I agree this is suspicious.
I'm using the 2SRC-derived model.
Below is another figure with 2H8H and 3U4W based models shown.
Looking at the Activation Loop in Pymol, these models should be similar (Src models for 2SRC (green), 2H8H (magenta), and 3U4W (yellow) shown here, with the activation loop of 3U4W in orange).
Also worth noting, the above figures (except the one in this post) were made using all the atoms for the RMSD, which is not the same as Shukla where they just used heavy atoms. But when I changed my version to alpha carbons or backbones (the figure in this post is made with backbone atoms), I don't see any major changes.
It's definitely a good idea to use just heavy atoms here for RMSD calculation.
The models do look weird---there is some significant deviation in the activation loop.
My hypothesis is that either the MODELLER stage or the implicit/explicit refinement stages are causing some minimal structural deviation here. Is it possible to superimpose the model with its template in each of these cases, and then to highlight the activation loop?
Is the activation loop resolved in each of these templates?
Yeah, I didn't realize the default was all atoms...
Why do you think these models look weird? There is really not that much deviation in the activation, esp. compared to 2SRC vs. 1Y57 (in blue below).
Below are superpositions of 2SRC, 2H8H, and 3U4W Src model (these are implicit-refined) and the original PDB... 2SRC
2H8H
3U4W
Think I might have cracked this case. Seems like I was just using atom_slice
stupidly. Basically means I was just plotting overall RMSD instead of RMSD of the Activation Loop. Will need to investigate a little further, but I think this is now the appropriate figure. I actually did this for both the implicit-refined and the original models (no hydrogens on these, since getting the xtc for original models with hydrogens is still a WIP (though it seems this is totally a non-issue)). Note that the stars are still the implicit-refined Src models of the PDB in question.
Implicit refined (traj-refine_implicit_md.xtc
), correct atom_slice
usage (or more correct, anyway):
Original models (modelstraj.xtc
), correct atom_slice
usage (for RMSD backbone is used now, so not having hydrogens should not matter) (for the distance metric, not having hydrogens shouldn't matter either, since compute_contacts
uses 'closest-heavy' as the default):
Thanks so much for figuring this out! This looks much more like the Shukla plot, though it suggests that 2A RMSD inactive conformation is really observed in only a relatively small number of crystal structures, which is pretty cool.
I wouldn't worry about hydrogens here---we wouldn't want to include them in the computed RMSD anyway. Heavy-atom or backbone heavy atom RMSD is sufficient.
Got a plot for new DDR1 models:
Note: not implying that this should be part of the manuscript, but wasn't quite sure where else to put this...
Was making some of these plots to make a bit more sense of our simulation data, and in super-imposing the ensembler models, found that just looking at the ensembler models was sort of interesting.
The first is just a scatter plot, the second is a seaborn kde plot of the same thing. The axes are inspired by figure 2 in the Shukla Nature Comms paper, as suggested to us by Markus.
Abl:
Src: