Closed danielparton closed 9 years ago
I like this!
In addition, can you try superimposing models selected from each of the seqid bins you used for the other figure? That way we can also visualize the ensembles as a function of seqid classes.
Same figure with the colormap I mentioned. I think I have a slight preference for this compared to the all-black representation. (@jchodera - I'm also working on the figure you suggested)
Is the colormap only grey to blue? Maybe red to blue would be good?
Blue to red version (white in the middle):
In this version, models are picked randomly (without replacement) from each three sequence identity classes: 0-35% 35-55% 55-100%
Three models from each class, for a total of nine.
Coloring/transparency is based on sequence identity as before.
I think I slightly prefer this version, as it shows a similar amount of model variation with a lesser number of models (9 vs 14).
For whatever reason, I kind of prefer the black and white version of this... (2 cents)
Here's a version with models picked from k-medoids clustering on RMSD. It does a nice job of picking out models over a range of sequence identities. However, the visual appearance of the figure is rather sensitive to the number of clusters. I kind of prefer the look of the previous figure (9 models picked from three sequence identity classes), but maybe we should choose the figure based on the method used, not the way it looks.
Number of models selected: 9
47 99.2
141 68.1
307 45.7
443 41.7
896 36.3
1113 32.3
2254 26.0
2753 26.0
3591 23.6
Number of models selected: 8
6 100.0
81 99.2
175 67.7
328 44.1
443 41.7
2308 26.0
3591 23.6
4102 22.3
Is it possible to have 3 subfigures, one for each seqid class?
Three subfigures, one for each seqid class (transparency removed):
Love it!
Are the structures within each subplot selected randomly, equally spaced in seqid, or clustered?
Randomly. These are the model indices and seqids:
39 99.2
125 68.1
160 67.7
471 41.3
806 37.4
828 36.6
978 34.6
1826 27.2
4185 20.9
Would it be a pain to try to cluster the conformations in each seqid class?
No, should be simple enough. I'll give it a try
Ok, here's a version made using 3 clusters per seqid class. Cluster centroids are shown.
Number of models selected: 9
6 100.0
131 68.1
175 67.7
328 44.1
400 42.4
560 40.6
1751 28.0
1760 27.7
3579 23.6
Looks nice!
How did you choose the magic number of 3 centroids per plot?
Just so there would be 9 models...
What would be a better way of choosing the number of clusters per seqid class?
9 models is arbitrary, yes? Or was there a reason for that?
Choosing the number of models for clustering (in general) is a difficult problem. Often, measures of intracluster variance, or a metric (e.g. the sum of intracluster variances) can help.
For our purposes, I think we just want to give an idea of what the diversity looks like. 3 is pretty uncluttered, and may be optimal for that. But it's possible 4 or 5 might still be useful to look at.
I'd say we leave things as is for now and move onto the other things, like computing the interatomic distances associated with kinase activation for Src and Abl.
Yes, 9 models is arbitrary. The only reasoning was that it looked fairly good (i.e. useful information vs. clutter) when plotting 9 superimposed models. I'll move on for now.
Turns out I'd done the seqid colormaps wrongly for this figure too. Corrected in this figure, and put all models in one superposition following discussion with John. Abl1 superposition is also included here.
To recap: clustering (k=3) is performed on the models within each of three seqid ranges (0-35, 35-55, 55-100). The nine centroid models are rendered, with sequence identity mapped to both color and transparency.
These are the model indices and sequence identities:
Src:
11 100.0
21 99.2
117 68.1
286 44.1
348 42.4
751 36.6
1186 29.2
1593 28.0
3375 24.0
Abl1:
11 100.0
21 100.0
47 99.6
180 48.8
388 43.9
802 39.2
2438 25.5
3090 23.9
3143 23.9
Looks good! The clustering procedure might still be weird to describe, but let's stick with it for now.
This is a first pass attempt at a superposition of models for Src. I'll also do Abl once we have settled upon a style for this figure.
14 models are shown, selected at (roughly) regular intervals along the sequence identity distribution. Models are colored and given a transparency based on sequence identity (100% seq identity = opaque; 0% seq identity = completely transparent). The model indices and sequence identities are as follows:
The models are chosen by taking seq identity values at regular intervals from 0-100, and finding the models with the closest seq identity. Duplicates are removed, hence the uneven distribution of values above. (There's probably a better way of doing this.)
I've been trying to get a colormap type thing working, so models are also colored based on seq identity. That should help distinguish the models even more.
You can see quite high variance in the activation loop (right-hand side between the two lobes), and what looks to be a fairly reasonable degree of variance in the other areas.