Some Manuscript Comments

pgrinaway commented 9 years ago

Hi all,

I have some comments on the manuscript:

General

Do we want to mention why pairwise alignment is chosen? That a multiple sequence alignment could work, or that ensembler is modular and thus one can substitute one’s own alignment algorithm?

Do we want to explain how (or why not) we ensure that systems receive the same topology?

Agreed with Kyle that perhaps so many command line examples may not be necessary

For future directions, do we want to hint at Ensembler 2? If we are going to mention the quantity of data that is produced, we can note that the architecture of the pipeline lends itself well to modern distributed computing paradigms.

Specific

lines 66-70: could bring the field of computational biophysics (structural biology) nearer to the high-throughput techniques of genomics?

127 - the sequences the user is interested in generating simulation-ready structural models for —> for which?

140 - gather_targets is used

212 -matching residues then extracted -> matching resides are then extracted?

243 - Maybe mention the increased survival of models as a result of the prebuilding?

263 - residues span —> residue spans

328 - while long simulations are ineffective, even short ones are effective?

344 - MD improves model quality— agreed, but is this a given? do we have data to suggest what would happen without MD refinement? Or perhaps since the MODELLER bit itself uses some kind of forcefield, it is already assumed.

540 - do we need to discuss data quantity?

614 - how do we know the high sequence identities represent metastable states more likely?

664 - they could be repeated, but I think the inputs used to generate the results in the paper are already in the methods sections

pgrinaway commented 9 years ago

Also, might it be helpful to have an explicit plot of sequence identity vs. RMSD of model? I see that Fig. 4 stratifies models, which is interesting as well.

jchodera commented 9 years ago

Thanks, @pgrinaway! I like all of these, with the following exceptions:

Command-line arguments. I think it's good to have a tight mapping between what can be done and how it can be done. This could go in the Appendix, but if it doesn't cause too much trouble or awkwardness to have it in the main body text, I'm not averse to keeping them.

344 - MD improves model quality— agreed, but is this a given? do we have data to suggest what would happen without MD refinement? Or perhaps since the MODELLER bit itself uses some kind of forcefield, it is already assumed.

MODELLER uses some kind of forcefield (modified from the CHARMM19 polar hydrogen forcefield, or a precursor thereof). The main point of MD refinement is to ensure that the models carried to the next stage (large-scale simulation) will not be so unreasonable that the simulations quickly become unstable and explode.

614 - how do we know the high sequence identities represent metastable states more likely?

I think we need to clearly state our assumption that structures derived from experiments on high sequence identity constructs are more likely to cover relevant representative metastable states of the target sequences than structures of remote-identity constructs.

Also, might it be helpful to have an explicit plot of sequence identity vs. RMSD of model? I see that Fig. 4 stratifies models, which is interesting as well.

This may be a good idea. Perhaps an image plot capable of showing the distribution of RMSD in each small seqid stratification could be useful?

pgrinaway commented 9 years ago

I think we need to clearly state our assumption that structures derived from experiments on high sequence identity constructs are more likely to cover relevant representative metastable states of the target sequences than structures of remote-identity constructs.

Agreed. Actually, maybe also pointing at fig 8 could be useful there, since high-seqid models end up near the inactive and active states?

sonyahanson commented 9 years ago

If the target is Plos Comp Biol, it doesn't seem like other papers have much command line code in their papers, even for the software ones: http://www.ploscollections.org/article/browseIssue.action?issue=info:doi/10.1371/issue.pcol.v03.i10

jchodera commented 9 years ago

This one puts them in special "boxes": http://www.ploscollections.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pcbi.1004008&representation=PDF

sonyahanson commented 9 years ago

That could be cool.

jchodera commented 9 years ago

I think these are addressed! Reopen if needed.

choderalab / ensembler-manuscripts

Some Manuscript Comments #30

General

Specific