choderalab / ensembler-manuscripts

Manuscript for Ensembler v1
0 stars 3 forks source link

Some extra plots of ensembler models #45

Open sonyahanson opened 9 years ago

sonyahanson commented 9 years ago

Note: not implying that this should be part of the manuscript, but wasn't quite sure where else to put this...

Was making some of these plots to make a bit more sense of our simulation data, and in super-imposing the ensembler models, found that just looking at the ensembler models was sort of interesting.

The first is just a scatter plot, the second is a seaborn kde plot of the same thing. The axes are inspired by figure 2 in the Shukla Nature Comms paper, as suggested to us by Markus.

Abl: plot_conf_abl_all_wmodels plot_conf_abl_ensembler_density

Src: plot_conf_src_all_wmodels plot_conf_src_ensembler_density

jchodera commented 9 years ago

Can you paste in the Shukla plot for comparison?

sonyahanson commented 9 years ago

Not 100% comparable, not sure exactly why. Perhaps there is an intentional flip in the y axis, but this doesn't explain everything... Could also not be a problem on our side... shukla

sonyahanson commented 9 years ago

I just put the code for the first set of plots in the MSMs repository (scatterplot commented out ATM): https://github.com/choderalab/MSMs/tree/master/plots

If anyone wants to play with it or find any mistakes I might have made.

jchodera commented 9 years ago

This may not be so bad. I think making the following changes might help:

Might also be useful to color the points by sequence identity.

sonyahanson commented 9 years ago

May not be so bad in what context? Might help toward what?

The axes are already quite similar [-2,2] and [.3,1] in nanometers for the kdeplots. As far as I can tell the definitions for RMSD and the distance difference are the same, but it's a little hard to verify. Will keep looking into this.

Right now it seems like our equivalent to this long well around -10 Angstroms in the Shukla figure is actually at ~5 Angstroms (0.5 nm) in the ensembler models. Again, not sure if this is because of something different in the way the plots were made or because of a legitimate difference in what the structures say vs. what the shukla simulations say.

Also @maxentile helped a bit with this, so maybe he'd be interested/ have input.

jchodera commented 9 years ago

May not be so bad in what context? Might help toward what?

You had said:

Not 100% comparable, not sure exactly why. Perhaps there is an intentional flip in the y axis, but this doesn't explain everything... Could also not be a problem on our side...

Which suggests you suspected some significant discrepancies between your plots and the Shukla plot. I remarked that the difference between the two may not be so bad, but it was hard to compare the figures because of the differences in choice of axis ranges and units.

jchodera commented 9 years ago

Or at least my brain has trouble comparing them due to the differences in axis ranges and units...

maxentile commented 9 years ago

The only other big difference I've noticed is that, in Figure 2, the points are weighted by the free energies of the corresponding MSM states (see caption), which might complicate comparison to an unweighted density plot.

sonyahanson commented 9 years ago

Definitely a good point @maxentile .

jchodera commented 9 years ago

The only other big difference I've noticed is that, in Figure 2, the points are weighted by the free energies of the corresponding MSM states (see caption), which might complicate comparison to an unweighted density plot.

Of course. We're just hoping that our Ensembler seeded models cover an area that is a superset of the support in the Shukla figure.

sonyahanson commented 9 years ago

Slightly modified plots for John's brain: Src plot_conf_src_ensembler_density_units Abl plot_conf_abl_ensembler_density_units

sonyahanson commented 9 years ago

If we really wanted to understand this better, should take a look at the PDB endpoints from the Shukla paper, and use those to figure out if there is something these plots have got wrong/different.

Not sure if continuing down this route (e.g. trying to replicate Shukla data) rather than figuring out our own version of interesting coordinates (e.g. DFG flip dihedral and something else) wouldn't be more useful.

sonyahanson commented 9 years ago

Also, I find it sorta interesting that the Src and Abl plots do look slightly different, despite being made from the same set of structures.

jchodera commented 9 years ago

Also, I find it sorta interesting that the Src and Abl plots do look slightly different, despite being made from the same set of structures.

This is probably just a typo in the axis labeling, but the labels for both Src and Abl are the same residue numbers. Wouldn't the residue numbers be different for the corresponding E, R, and K residues in Src and Abl?

sonyahanson commented 9 years ago

Will triple check that the residues are right, but indeed the residue numbers used to calculate this are different in src vs. abl. The axis label does not come from the actual residues used since the pdb numbering is meaningless.

sonyahanson commented 9 years ago

Updated Abl plot with correct Abl uniprot residue numbers: plot_conf_abl_ensembler_density_units

sonyahanson commented 9 years ago

Hmmm... At least broadly it looks like these coordinates should be the same as the Shukla paper. Plotting 2SRC (which they used as their inactive) and 1Y57 (which the used as their active) onto our kdeplots (and now overlayed scatterplot) of the ensembler models, we get the same L-shape that they do.

plot_conf_src_ensembler_density_units_strucs

plot_conf_abl_ensembler_density_units_strucs

jchodera commented 9 years ago

I wonder why we are completely missing stuff near the 2SRC point.

One issue could be that the volume of configuration space gets very tiny near 0 RMSD to any reference structure. It could be that anything we do at all---modeling, implicit solvent relaxation--will quickly move it a few Angstroms RMSD away.

When you are comparing to 2SRC, do you use 2SRC itself, or the 2SRC-derived model? I would imagine there has to be a cluster of models near that 2SRC model...

sonyahanson commented 9 years ago

I agree this is suspicious.

I'm using the 2SRC-derived model.

Below is another figure with 2H8H and 3U4W based models shown. plot_conf_src_ensembler_density_units_strucs_more

Looking at the Activation Loop in Pymol, these models should be similar (Src models for 2SRC (green), 2H8H (magenta), and 3U4W (yellow) shown here, with the activation loop of 3U4W in orange). 2src_aloop

Also worth noting, the above figures (except the one in this post) were made using all the atoms for the RMSD, which is not the same as Shukla where they just used heavy atoms. But when I changed my version to alpha carbons or backbones (the figure in this post is made with backbone atoms), I don't see any major changes.

jchodera commented 9 years ago

It's definitely a good idea to use just heavy atoms here for RMSD calculation.

The models do look weird---there is some significant deviation in the activation loop.

jchodera commented 9 years ago

My hypothesis is that either the MODELLER stage or the implicit/explicit refinement stages are causing some minimal structural deviation here. Is it possible to superimpose the model with its template in each of these cases, and then to highlight the activation loop?

Is the activation loop resolved in each of these templates?

sonyahanson commented 9 years ago

Yeah, I didn't realize the default was all atoms...

Why do you think these models look weird? There is really not that much deviation in the activation, esp. compared to 2SRC vs. 1Y57 (in blue below).

2src_aloop_1y57

Below are superpositions of 2SRC, 2H8H, and 3U4W Src model (these are implicit-refined) and the original PDB... 2SRC 2src_pdb_model

2H8H 2h8h_pdb_model

3U4W 3u4w_pdb_model

sonyahanson commented 9 years ago

Think I might have cracked this case. Seems like I was just using atom_slice stupidly. Basically means I was just plotting overall RMSD instead of RMSD of the Activation Loop. Will need to investigate a little further, but I think this is now the appropriate figure. I actually did this for both the implicit-refined and the original models (no hydrogens on these, since getting the xtc for original models with hydrogens is still a WIP (though it seems this is totally a non-issue)). Note that the stars are still the implicit-refined Src models of the PDB in question.

Implicit refined (traj-refine_implicit_md.xtc), correct atom_slice usage (or more correct, anyway):

plot_conf_src_ensembler_density_units_strucs_more_fix

Original models (modelstraj.xtc), correct atom_slice usage (for RMSD backbone is used now, so not having hydrogens should not matter) (for the distance metric, not having hydrogens shouldn't matter either, since compute_contacts uses 'closest-heavy' as the default):

plot_conf_src_ensembler_density_units_strucs_modtraj

jchodera commented 9 years ago

Thanks so much for figuring this out! This looks much more like the Shukla plot, though it suggests that 2A RMSD inactive conformation is really observed in only a relatively small number of crystal structures, which is pretty cool.

I wouldn't worry about hydrogens here---we wouldn't want to include them in the computed RMSD anyway. Heavy-atom or backbone heavy atom RMSD is sufficient.

sonyahanson commented 9 years ago

Got a plot for new DDR1 models:

plotting_shukla_fig2_ddr1-colors

Code here.