dmzuckerman / Sampling-Uncertainty

Best Practices article intended for LiveCoMS
36 stars 5 forks source link

Terminology / NIST internal review #9

Closed mangiapasta closed 6 years ago

mangiapasta commented 6 years ago

When it comes to language about uncertainties, I think it would be helpful if we can settle on a common terminology and definitions. From NIST perspective, this may a necessity. The document will have to go through an internal review that takes ~ 1 month before it can be published with NIST authors. Our reviewers will check language, units, etc. to make sure that they comply with our standards (we are, after all, National Institute of Standards and Technology....). I will look into what the requirements are for a document of this type.

As an example of something I'd like to flag, near the beginning of the document on specific observables uncertainty, I found the following line:

"The standard error is the standard deviation of the distribution of the results that would be obtained by repeating the simulation."

I've seen standard error used a few different ways in various communities. I've always taken it to be the sample variance divided by the number of samples, which characterizes the variance in the sample mean relative to the true mean. (I'm pretty sure this is consistent with NIST definition, but I'll double check).

I'm not sure the best way to handle this, but perhaps as we go along, folks can flag various terms in this Issue thread and we can settle on common definitions...?

dwsideriusNIST commented 6 years ago

Paul, I have to second your comment here. As far as which requirements are appropriate, I think we just need to follow NIST SP-811.

We should confirm that all of the quantitative descriptors are consistent and unambiguous. I'm freed up from end of FY stuff now, so I'll give the document a read-through and generate a list of statistical descriptors for which to confirm definitions. It might be useful to have a glossary section or appendix, just that it's really clear what is meant by the really important statistical terms.

mangiapasta commented 6 years ago

Great, thanks! I just talked to my division chief. He pointed me to http://www.bipm.org/en/publications/guides/vim.html which he says is an appropriate resource for use of terminology. I'm told that NIST played a role in development of these standards, so it makes sense for us to use them.

(I just looked at the website now; it's not clear to me that it has definitions of terms for uncertainty quantification.)

dwsideriusNIST commented 6 years ago

Totally forgot about the VIM; let's use it to set vocabulary and then reference the GUM (another BIPM document, see links in https://www.nist.gov/information-technology-laboratory/sed/topic-areas/measurement-uncertainty) if clarification is necessary. We could also point readers to the statistics handbook (https://www.nist.gov/programs-projects/nistsematech-engineering-statistics-handbook) if needed.

Also, if we state up front that we're following the VIM, then a glossary section is unneeded.

dmzuckerman commented 6 years ago

@mangiapasta and @dwsideriusNIST who even knew there was such a thing as a VIM?!? Beautiful. Let me look through as soon as I can and reach out to the group after. I appreciate the points above, including the "S" in NIST. This will be a good thing. We can always offer 'translation' if the official terminology differs from community-specific lingo.

dmzuckerman commented 6 years ago

I did a quick reivew of the VIM, which I added to our repo (including some highlighting which may help folks navigate this 100+ page pdf; in Adobe, click on Comments to see where these occur).

Given that all this is standardized (and required with NIST) I think we should indeed go along with it. We should also provide some helpful translation (which we can remind readers of in key places), which can be done up front in our defintions section. For instance, we can explain to everyone why 'standard error of the mean' is deprecated' ... which I just learned!

For those not wanting to wade through the pdf, the short version is that in modern metrology, "error" (as in std err of the mean) is considered the absolute truth ... and hence intrinsically unknowable. (Very philosophical, I know!) Thus we mortals must content ourselves with "uncertainty" - roughly speaking, our best estimate of the error.

I suggest that @mangiapasta or @dwsideriusNIST take a crack at updating defintions in the next round, as you guys seem to be the very models of modern metrologists, so to speak.

mangiapasta commented 6 years ago

@dwsideriusNIST Dan, can you verify these definitions? I think I've been using wrong terminology myself, and I want to make sure we're all in agreement on what we're actually saying. (Screenshots are straight from GUM)

Standard uncertainty: uncertainty in a result as expressed in terms of a standard deviation. (Note: which method we use to compute the standard deviation depends on what uncertainty we are trying to express).

Arithmetic mean: An estimate of the true mean of a random quantity. The arithmetic mean is given by the formula screenshot from 2017-10-12 09-43-11 where Y_k is a realization of the random variable. (I have been calling this the sample mean, but GUM calls it the arithmetic mean)

Experimental standard deviation: an estimate of the standard deviation of a random variable, given by the formula screenshot from 2017-10-12 09-46-20 where q_k is a realization of the random quantity and \bar q is the arithmetic mean. (I've been calling this the sample standard deviation, but GUM uses the above language).

Experimental standard deviation of the mean: an estimate of the standard deviation of the arithmetic mean relative to the true mean. It is given by the formula screenshot from 2017-10-12 09-50-47 where quantities have the same meaning as the previous definition. (I've incorrectly been calling this standard error. Oops!)

agrossfield commented 6 years ago

Every text I’ve seen has referred to the last quantity as the standard error. At the very least, we need to explain not just what the correct terminology is, but also the common misuses of them (if this is actually wrong), because we’re not going to be able to change people’s usage (at least not in the short term).

Alan

On Oct 12, 2017, at 9:55 AM, mangiapasta notifications@github.com<mailto:notifications@github.com> wrote:

@dwsideriusNISThttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dwsideriusnist&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=A2he-OgWWuN0cKeLiqC75_csl9PvOZ70gAlZXZfV-IE&s=__bP_rGAqqpTjsEI660xNdVcbf5PivQSd719wW4JiBk&e= Dan, can you verify these definitions? I think I've been using wrong terminology myself, and I want to make sure we're all in agreement on what we're actually saying. (Screenshots are straight from GUM)

Standard uncertainty: uncertainty in a result as expressed in terms of a standard deviation. (Note: which method we use to compute the standard deviation depends on what uncertainty we are trying to express).

Arithmetic mean: An estimate of the true mean of a random quantity. The arithmetic mean is given by the formula [screenshot from 2017-10-12 09-43-11]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_31415825_31499124-2De21fc6a4-2Daf31-2D11e7-2D8ef7-2D2bbd8cfb0d39.png&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=A2he-OgWWuN0cKeLiqC75_csl9PvOZ70gAlZXZfV-IE&s=USwaoz9JnUM4rWFp1the_XW1bxlm2JqvUL_wf1xD5ck&e= where Y_k is a realization of the random variable. (I have been calling this the sample mean, but GUM calls it the arithmetic mean)

Experimental standard deviation: an estimate of the standard deviation of a random variable, given by the formula [screenshot from 2017-10-12 09-46-20]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_31415825_31499272-2D4ee21120-2Daf32-2D11e7-2D9be6-2D586d798ed8bc.png&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=A2he-OgWWuN0cKeLiqC75_csl9PvOZ70gAlZXZfV-IE&s=f9l5yuysc2QSzWhARKYOvs76f1yHCT8PsV4wpMVEULQ&e= where q_k is a realization of the random quantity and \bar q is the arithmetic mean. (I've been calling this the sample standard deviation, but GUM uses the above language).

Experimental standard deviation of the mean: an estimate of the standard deviation of the arithmetic mean relative to the true mean. It is given by the formula [screenshot from 2017-10-12 09-50-47]https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_31415825_31499516-2D05781bfa-2Daf33-2D11e7-2D80fc-2D80a2aab53c4f.png&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=A2he-OgWWuN0cKeLiqC75_csl9PvOZ70gAlZXZfV-IE&s=4ESsKYRe8nhJLX8Sk80qz7GsrBUv1b4MqVHEU7hlMV0&e= where quantities have the same meaning as the previous definition. (I've incorrectly been calling this standard error. Oops!)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmzuckerman_Sampling-2DUncertainty_issues_9-23issuecomment-2D336144643&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=A2he-OgWWuN0cKeLiqC75_csl9PvOZ70gAlZXZfV-IE&s=WZTbv8JrNxsn2yH-Sc_wKjlU_X9Rm3hC6qSBZ6PK2tc&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM-5F-2D8q77VNm6Jx4SLZ1VPCiPKFWPsq9Aks5srhpGgaJpZM4PquNd&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=A2he-OgWWuN0cKeLiqC75_csl9PvOZ70gAlZXZfV-IE&s=Z-v68MEYxkX4ucSf7PCENnddSiqWQe7YyNlO7-zgaJA&e=.


Dr. Alan Grossfield Associate Professor Department of Biochemistry and Biophysics University of Rochester Medical Center 610 Elmwood Ave, Box 712 Rochester, NY 14642 Phone: 585 276 4193 http://membrane.urmc.rochester.edu

dwsideriusNIST commented 6 years ago

Paul, First, all of your personal definitions of these quantities are typical terms used in engineering statistics, but alas the GUM rules supreme. I used those terms myself in a chapter on engineering statistics. Second, here is a slight revision to your definitions:

Standard uncertainty: uncertainty in a result as expressed in terms of a standard deviation. (Note: which method we use to compute the standard deviation depends on what uncertainty we are trying to express). (DWS note: are you getting at the difference between type A and B uncertainty?)

Arithmetic mean: An estimate of the expectation value of a random quantity. The arithmetic mean is given by the formula image where Y_k is an experimental realization of the random variable.

Experimental standard deviation: an estimate of the standard deviation of a random variable, given by the formula image where q_k is an experimental realization of the random quantity and \bar q is the arithmetic mean.

Experimental standard deviation of the mean: an estimate of the standard deviation of the distribution of arithmetic mean. It is given by the formula image and is used to characterize the dispersion of the arithmetic mean relative to the expectation value of the same quantity.

dwsideriusNIST commented 6 years ago

Alan and Paul, I think a footnote or endnote giving "equivalency" definitions may be helpful. The point of the GUM is to establish definitions with total lack of ambiguity in an international context, so it does have to depart from colloquial usages. The downside is that it becomes excessively pedantic.

agrossfield commented 6 years ago

I agree, but I think we have to go a step farther than that. I see junior investigators as the largest audience for this work, and we have to warn them how they’re going to see the words used, even if they’re technically incorrect. I’ve got the same thing going in the free energy profile document, since many people (most people, I think) use the term potential of mean force incorrectly. I want our readers to use the words correctly, but also to be prepared to understand typical papers, which are much sloppier.

Alan

On Oct 12, 2017, at 10:40 AM, Daniel W. Siderius notifications@github.com<mailto:notifications@github.com> wrote:

Alan and Paul, I think a footnote or endnote giving "equivalency" definitions may be helpful. The point of the GUM is to establish definitions with total lack of ambiguity in an international context, so it does have to depart from colloquial usages. The downside is that it becomes excessively pedantic.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmzuckerman_Sampling-2DUncertainty_issues_9-23issuecomment-2D336158265&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=32pf0EXIuzhu9J4uAdelYV23rKoBPpiKj6-WS6BuZAg&s=2r3puru034II7zsFTOGvDt9R3DaXabYYvyYCsDi1GBA&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM-5F-2D8lUfbODXjje15sYO50AUUVzKHa7Kks5sriTngaJpZM4PquNd&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=32pf0EXIuzhu9J4uAdelYV23rKoBPpiKj6-WS6BuZAg&s=oRfGyRgBzWyaaBzZp3MdncdVEYe1LB3KyOXwtUzIgEs&e=.


Dr. Alan Grossfield Associate Professor Department of Biochemistry and Biophysics University of Rochester Medical Center 610 Elmwood Ave, Box 712 Rochester, NY 14642 Phone: 585 276 4193 http://membrane.urmc.rochester.edu

mangiapasta commented 6 years ago

@dwsideriusNIST Re: your comment, "are you getting at the difference between type A and B uncertainty?"

I'm more thinking about the discussion ongoing in the Specific Observables thread. Dan Zuckerman asked a question about what we should report as "reasonable" error bars. It seems that there are a variety of things one is uncertain about: (i) the output of a simulation; (ii) the expected value of the output of a simulation, etc. The first is characterized by the experimental standard deviation, while the second is characterized by the experimental standard deviation of the mean. So, mostly, I want to make sure that "standard uncertainty" isn't identified as corresponding to one particular type of standard deviation. [In that vein, I felt that the error bars (1-sigma, 2-sigma, or 3, etc.) on a plot or as used to assess statistical significance in a comparison should take into account the type of uncertainty we are actually computing.]

It's true though that type A and type B uncertainties then enter the discussion as additional considerations when computing uncertainties.

Dan, I like your changes, btw. Also, @agrossfield I think you are correct in that junior readers should be made aware of how terminology has been used in the past.

mangiapasta commented 6 years ago

In other words, we are always uncertain about something. The means of computing and representing that uncertainty (e.g. on a plot) cannot be divorced from the thing we are uncertain about.

So, I understand the definition "standard uncertainty" as only specifying that the uncertainty is expressed in terms of a standard deviation. It does not specify what the object of uncertainty is, nor does it specify the method of computing uncertainty (beyond that it be expressible as a standard deviation of some kind).

dwsideriusNIST commented 6 years ago

@agrossfield Re: warnings for junior readers: I anticipated that the paper would have a paragraph at the start laying out the definition of statistical quantities, which should satisfy the need for a warning. Then point to either an appendix or footnote to give the common, yet technically inadequate, terms.

dwsideriusNIST commented 6 years ago

@mangiapasta re: "standard uncertainty"

OK, thanks for the clarification. As I read the GUM, "standard uncertainty" is an umbrella term, then you have to choose the right statistical descriptor.

Regarding error bars, I am a bit agnostic on the specific choice. My preference is that the author state clearly and unambiguously what the error bars indicate, then it doesn't really matter to me what they actually are. If an author is trying to communicate their best estimate of the expectation value (population mean, in old terminology), then the error bar should indicate the confidence limits based on 1) the experimental standard deviation of the mean and 2) a stated confidence interval (e.g., 95%).

This is starting to cross-talk with the Specific Observables section, sorry.

agrossfield commented 6 years ago

Agreed. The key is to be clear what you’re doing, and to be consistent.


Dr. Alan Grossfield Associate Professor Department of Biochemistry and Biophysics University of Rochester Medical Center 610 Elmwood Ave, Box 712 Rochester, NY 14642 Phone: 585 276 4193<tel:585%20276%204193> http://membrane.urmc.rochester.eduuhttp://membrane.urmc.rochester.edu

On Oct 12, 2017 at 11:16 AM, <Daniel W. Sideriusmailto:notifications@github.com> wrote:

@mangiapastahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mangiapasta&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=HxxyPJr7-_vr4tGOIUtA1UBZRVny6M956WXvlcZ7asg&s=-rUDvI9--CD0DTZsqyNCC-LTPCUYjZe5vSNY8VDj7ms&e= re: "standard uncertainty"

OK, thanks for the clarification. As I read the GUM, "standard uncertainty" is an umbrella term, then you have to choose the right statistical descriptor.

Regarding error bars, I am a bit agnostic on the specific choice. My preference is that the author state clearly and unambiguously what the error bars indicate, then it doesn't really matter to me what they actually are. If an author is trying to communicate their best estimate of the expectation value (population mean, in old terminology), then the error bar should indicate the confidence limits based on 1) the experimental standard deviation of the mean and 2) a stated confidence interval (e.g., 95%).

This is starting to cross-talk with the Specific Observables section, sorry.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmzuckerman_Sampling-2DUncertainty_issues_9-23issuecomment-2D336169645&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=HxxyPJr7-_vr4tGOIUtA1UBZRVny6M956WXvlcZ7asg&s=9obTklz4CTCNmyTdbW6wMYj6SeW8q9DNU8biCXa-Sa8&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM-5F-2D8u-5Fs7f-2DYx7MKVL0Y93-2Dg4Fy95pqsks5sri0vgaJpZM4PquNd&d=DwMFaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=HxxyPJr7-_vr4tGOIUtA1UBZRVny6M956WXvlcZ7asg&s=FKIxHHOhcSLA8s3Rj2ijlZM3ff1z039g2Wm4Bw7uVEE&e=.

dmzuckerman commented 6 years ago

From today's discussion

dmzuckerman commented 6 years ago

@dwsideriusNIST and @mangiapasta - thanks so much for putting together definitions. Very helpful. I have a number of questions and suggestions:

mangiapasta commented 6 years ago

@dwsideriusNIST and @dmzuckerman

1) I have a slight preference to not alphabetize if there is a natural ordering of concepts. I think there is such an ordering but have yet to settle on what it is. One possibility would be to start with more general concepts and end with more specific ones, and also put "idealized" concepts (like true value) upfront and more practical things that we can compute further down. This highlights the idea that we have in mind a "statistical truth" that we want to compute, but in reality can only approximate this truth. So, a possible re-ordering could be

True value Accuracy Precision Raw Data Derived observables Uncorrelated observables* Correlated observables Correlation time Standard uncertainty Confidence Interval Arithemetic mean Experimental Standard deviation Experimental standard deviation of the mean Standard deviation Standard error of the mean

This ordering would let us introduce the idea of correlations before discussion of standard deviation of the mean, for which Dan is right to point out that the data should be uncorrelated. More generally speaking, I think the concepts lower down have dependence on those higher up, but not vice versa.

If this ordering is too weird, or if it's too much work to reorder the list, then I'm also fine with alphabetical order.

2) I'm fine including standard error, etc. in the list. Also fine to remark on use of the word "experimental." In fact, should we actually remove that word altogether, or at least put it in parentheses? The language seems too tailored towards experimentalists.

3) I agree it's important to point out that experimental standard deviation of the mean refers to uncorrelated samples. (Here I think lack of correlations, more so than independence, is what is needed). I believe that Eq. (2) also requires samples to be uncorrelated, although not Eq. (1).

4) Perhaps we use q_j and q_j' as opposed to q_j and q_k?

5) Dan Z, you're right to point out that we've conflated independence and uncorrelated. That's my fault. That being said, my understanding is that Eq. (3) is a consequence of lack of correlations, not independence, since one only requires that terms E[(x_j - mean)(x_k-mean)]=0 to show that the estimators are unbiased. My preference here would be to rename the definition "Uncorrelated Observables" and otherwise leave the definition as is. Then, we can put a remark indicating that uncorrelated is not the same as independent.

6) Perhaps we should have a remark that points the reader to references that show the relevant calculations for correlated random variables?

7) Technically speaking, our estimate of the exp. std. dev. and exp. std. dev. of mean are biased. That is, the corresponding estimates for the variance are unbiased, so that in expectation, our definition returns the true variance. But taking the square root of an expectation value is not the same of the expectation of the square root by Jensen's inequality. Despite all of this, everyone uses the equations we've written all the time. Is it worth pointing out any of the issues I've just mentioned?

dwsideriusNIST commented 6 years ago

@mangiapasta @dmzuckerman

I agree with Paul that, unless it is unfeasible, we do not alphabetize the glossary of statistical terms but put them in a 'natural' order. Let's work on a revision with a natural order and see how it works.

@dmzuckerman We're also going to include an 'equivalency' section that lays out the commonly used terms vis a vis the VIM terminology. We also experimented with an internal linking mechanism that takes a reader from a usage of the VIM terminology back to the glossary and/or the equivalency table/paragraph/whatever.

agrossfield commented 6 years ago

Putting the glossary in a "logical" order only makes sense if you assume people will read it from start to finish. However, if readers see a section labelled "glossary", I suspect they'll skip it and come back to it when they hit a term they're not sure of, in which case alphabetical ordering would make more sense.

Just a thought (and I haven't looked at the doc recently, so I may be totally in left field)

dmzuckerman commented 6 years ago

I'm closing this as glossary concerns have migrated to issue 24 (use of random in glossary)