dmzuckerman / Sampling-Uncertainty

Best Practices article intended for LiveCoMS
36 stars 5 forks source link

Pre-simulation sanity checks section #25

Closed dmzuckerman closed 6 years ago

dmzuckerman commented 6 years ago

@agrossfield @ajschult @drroe I would appreciate your edits and comments on this brief section. Thank you.

drroe commented 6 years ago

OK - I'll be traveling until Friday, I'll try to get to it then.

dmzuckerman commented 6 years ago

@ajschult would you pleae review this? Thank you.

dmzuckerman commented 6 years ago

@agrossfield @ajschult @drroe if you haven't yet reviewed this section, would you please try to do it in the next day or two? (And thank you if you have reviewed!)

dmzuckerman commented 6 years ago

@mangiapasta I put a query (in red) in the sanity-check section on autocorrelation that I think pertains to something you wrote.

Also, I removed a 'linearly' that you put in front of correlation in that section. Let me justify that. It's certainly true that we only need linear independence to get the nice sum of the variances result on which a lot of UQ is based. However, that's only because the community has made the convenient but I think arbitrary choice to base UQ PRIMARILY ON VARIANCE/SECOND MOMENT. Perhaps this is implicitly tied hoped-for normality. However, if we cared to look at higher moments, then we would find we needed independence beyond linear. So I think the fundamental thing is true independence (factorizability of joint distribution) but we use second-moment UQ so linear independence is sufficient for the tests we tend to do. I guess this is an obscure point, but I thought it would be one you'd enjoy discussing.

mangiapasta commented 6 years ago

Thanks. I'll take a look at the query and other comments. I'm still combing through sections slowly and will probably add a few comments / questions here and there.

On the issue of linear correlation and UQ based on second moments, I think that to a certain extent you are correct that the choice has some level of arbitrariness and convenience to it. That being said, for a lot of the estimators we are constructing (e.g. simple arithmetic means), variations on the Central-Limit-Theorem dictate that we should expect the estimates to be normally distributed. My (incomplete) understanding is that this applies even if the data is correlated. So, in that light, second moments are "asymptotically sufficient" to characterize uncertainty in many estimators of interest. With the autocorrelation analysis, I would actually argue that we can extend justification of a second-order moment estimates to correlated data, provided there is an applicable central-limit-theorem. (see CLT under weak dependence: https://en.wikipedia.org/wiki/Central_limit_theorem#Dependent_processes)

What are your thoughts?

At any rate, I'm fine with writing or omitting "linearly" in front of correlations. For better or worse, "correlation" is one word where I've been lazy with my consistency. I do agree with your general observation, however, that (linearly) uncorrelated does not imply independence. The latter is a stronger condition.

mangiapasta commented 6 years ago

In this vein, I think that the theory of fluctuations in statistical mechanics essentially banks on the Central-Limit-Theorem, although somewhat implicitly.

mangiapasta commented 6 years ago

Okay, hopefully I addressed the issues you raised. Let me know if I missed something.