dmzuckerman / Sampling-Uncertainty

Best Practices article intended for LiveCoMS
36 stars 5 forks source link

Global sampling section #33

Closed mangiapasta closed 6 years ago

mangiapasta commented 6 years ago

I figured it makes sense to create issue threads for each section. Hopefully this will keep some of the discussion a bit more self contained. A few opening thoughts on this section.

1) First paragraph, the text, "sampling quality, which can be framed most simply in the context of single-trajectory data:``Among the very large number of simulation frames (snapshots), how many are statistically independent?'' From a dynamical perspective, which also applies to Monte Carlo data, how long must one wait before the system completely loses memory of its prior configuration?"

This just sounds like an autocorrelation analysis to me, but reading more of the text, it seems that the method is subtly distinct. Can the original authors say something that would better distinguish this from autocorrelation analysis, or is it just the latter in disguise? I guess I'm asking for a definition of what "quality" means.

2) Key caveat subsection, the sentence, "The discussion here will focus largely on biomolecular systems, or more precisely, on systems for which it is straightforward to define a meaningful scalar distance between configurations." I guess I'm lost as to what these configurations actually are. As someone who operates more in the materials science, it would be helpful to have some idea of how these configurations are defined. Are they just phase-space distances? Those are easy to define even for material systems.

3) The decorrelation analysis in Section 6.2 seems like it's more about assessing that the phase-space has been well sampled. I left a few related comments in the text. Am I misunderstanding something here?

I left other questions scattered throughout the section. Partly this may just be my lack of knowledge about biomolecular sims.

dmzuckerman commented 6 years ago

@mangiapasta thanks for those. I have tried to address your comments in the text. But briefly

  1. yes it's basically an auto-correlation analysis but on the global configuration-space distribution
  2. configs are full set of x, y, z coords - now in text. see if you want to say something/amend to better include materials sci
  3. i guess you automatically assess sampling (in a self-consistent way, based on the parts of space you've seen so far) based on the implied effective sample size (total time/decorr time). but the direct outcome is a timescale ... or the conclusion that the trajectory isn't long enough to see statistical independence.
dmzuckerman commented 6 years ago

@agrossfield would you please take a look at Paul Patrone's question regarding 'block covariance in Sec. 6.1? It should be quick to deal with. As for me, in addition to addressing Paul's question on my method in the same section, I'm going to try to re-work the intro to Sec. 6 a bit more so that (a) materials science folks can benefit, but (b) we don't get so abstract we lose the non-expert bio MD folks.

BTW, feel free to edit intro or conclusions as Dan S, Paul and I think they're nearly final.