Closed pgrinaway closed 9 years ago
Also, might it be helpful to have an explicit plot of sequence identity vs. RMSD of model? I see that Fig. 4 stratifies models, which is interesting as well.
Thanks, @pgrinaway! I like all of these, with the following exceptions:
344 - MD improves model quality— agreed, but is this a given? do we have data to suggest what would happen without MD refinement? Or perhaps since the MODELLER bit itself uses some kind of forcefield, it is already assumed.
MODELLER uses some kind of forcefield (modified from the CHARMM19 polar hydrogen forcefield, or a precursor thereof). The main point of MD refinement is to ensure that the models carried to the next stage (large-scale simulation) will not be so unreasonable that the simulations quickly become unstable and explode.
614 - how do we know the high sequence identities represent metastable states more likely?
I think we need to clearly state our assumption that structures derived from experiments on high sequence identity constructs are more likely to cover relevant representative metastable states of the target sequences than structures of remote-identity constructs.
Also, might it be helpful to have an explicit plot of sequence identity vs. RMSD of model? I see that Fig. 4 stratifies models, which is interesting as well.
This may be a good idea. Perhaps an image plot capable of showing the distribution of RMSD in each small seqid stratification could be useful?
I think we need to clearly state our assumption that structures derived from experiments on high sequence identity constructs are more likely to cover relevant representative metastable states of the target sequences than structures of remote-identity constructs.
Agreed. Actually, maybe also pointing at fig 8 could be useful there, since high-seqid models end up near the inactive and active states?
If the target is Plos Comp Biol, it doesn't seem like other papers have much command line code in their papers, even for the software ones: http://www.ploscollections.org/article/browseIssue.action?issue=info:doi/10.1371/issue.pcol.v03.i10
This one puts them in special "boxes": http://www.ploscollections.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pcbi.1004008&representation=PDF
That could be cool.
I think these are addressed! Reopen if needed.
Hi all,
I have some comments on the manuscript:
General
Do we want to mention why pairwise alignment is chosen? That a multiple sequence alignment could work, or that ensembler is modular and thus one can substitute one’s own alignment algorithm?
Do we want to explain how (or why not) we ensure that systems receive the same topology?
Agreed with Kyle that perhaps so many command line examples may not be necessary
For future directions, do we want to hint at Ensembler 2? If we are going to mention the quantity of data that is produced, we can note that the architecture of the pipeline lends itself well to modern distributed computing paradigms.
Specific
lines 66-70: could bring the field of computational biophysics (structural biology) nearer to the high-throughput techniques of genomics?
127 - the sequences the user is interested in generating simulation-ready structural models for —> for which?
140 - gather_targets is used
212 -matching residues then extracted -> matching resides are then extracted?
243 - Maybe mention the increased survival of models as a result of the prebuilding?
263 - residues span —> residue spans
328 - while long simulations are ineffective, even short ones are effective?
344 - MD improves model quality— agreed, but is this a given? do we have data to suggest what would happen without MD refinement? Or perhaps since the MODELLER bit itself uses some kind of forcefield, it is already assumed.
540 - do we need to discuss data quantity?
614 - how do we know the high sequence identities represent metastable states more likely?
664 - they could be repeated, but I think the inputs used to generate the results in the paper are already in the methods sections