dmzuckerman / Sampling-Uncertainty

Best Practices article intended for LiveCoMS
35 stars 5 forks source link

Vocabulary: 'Confidence Interval' should be changed. #62

Open dwsideriusNIST opened 2 years ago

dwsideriusNIST commented 2 years ago

Learn something every day...

Per the VIM and GUM, the term 'confidence interval' (See VIM, definition 2.36 [coverage interval], Note 2; or GUM 6.2.2) should not be used. This would involve a lot of revisions to the paper.

Substitutions: 'coverage interval', 'expanded measurement uncertainty', or (not in GUM, but suggested by a NIST statistician) 'uncertainty interval'.

dmzuckerman commented 2 years ago

Wow, and ouch! Certainly, confidence interval is used very widely. But I guess we did want to stick to VIM. Do we have to do more than search and replace? Add to our glossary section?

agrossfield commented 2 years ago

At a minimum, if we’re not going to use the term confidence interval, we absolutely must define it, if only to give a preferred term. Otherwise, we’re abdicating our teaching role

Dr. Alan Grossfield Department of Biochemistry and Biophysics University of Rochester Medical Center

On Sep 10, 2021, at 8:17 PM, dmzuckerman @.***> wrote:



Wow, and ouch! Certainly, confidence interval is used very widely. But I guess we did want to stick to VIM. Do we have to do more than search and replace? Add to our glossary section?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmzuckerman_Sampling-2DUncertainty_issues_62-23issuecomment-2D917300876&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=uQmZ2xtSs2hos57WOuXsAh4SudYcTaNZtYe-lWM8yr4&s=9Nb4AZ5vwo6tSZGXVjuHha02RBoExPc2DXZFgAbehN8&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADH754T66DLLOR4QCZRBGWLUBKN27ANCNFSM5DZPXOIQ&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=uQmZ2xtSs2hos57WOuXsAh4SudYcTaNZtYe-lWM8yr4&s=dTrbQ3_WcmFWYZv3jed9mi6hTLbQFx072WRdxnl8jbc&e=. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=uQmZ2xtSs2hos57WOuXsAh4SudYcTaNZtYe-lWM8yr4&s=K_6jr44Gc2JEg5D3NcaK53zX3FEmpWDWGrS7wpQGjv0&e= or Androidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=uQmZ2xtSs2hos57WOuXsAh4SudYcTaNZtYe-lWM8yr4&s=11gueFYybAlXcZQxSWHWe6FKXzvgQWyRgAQvuQhWkyo&e=.

dwsideriusNIST commented 2 years ago

I think we have to deal with this in multiple ways:

  1. update the vocabulary, where we can also give 'confidence interval' as a colloquial usage while recommending against its usage [this should address Alan's concern about education]
  2. revise its usage throughout the paper - direct substitutions are probably OK, but let's check each

I'll try to carve out time for this in October/November

Additionally, should we consider a general revision to the paper? There have been some issues that we've left lingering (me included) and it's been three years since publication.

SmithUoG commented 2 years ago

I think the word "confidence interval" is a dangerous/misleading/incomplete term as used in the current scientific literature.

I offer the following draft as a basis for a treatment of "confidence intervals". (I also think it should be accompanied by examples.) This is essentially the way I teach the topic within the field of molecular simulation to my graduate students.

Any comments are welcome.


A CI is the specification of the range of values (or a region in the case of multiple parameters) within which the predicted or measured value of interest lies within some probability (95%, 99% or whatever, you choose). It thereby measures the "precision" of the predicted or measured value. Only if you assume that the theory or measurement apparatus is predicting/measuring the "true" value of the quantity, does the CI give a 95% (or whatever) interval in which the "true value" lies.

The term "confidence interval" (CI) is devoid of meaning unless both the value of the "confidence level" and the underlying probability distribution to which it refers are specified.

The underlying problem is that in experiments or theory, the quantity of interest is not typically obtained directly (in which case, one would simply perform replicate experiments and analyze their distribution), but is obtained indirectly from a set of measurements on different quantities and their use in a particular set of "model equations". The underlying probability distribution of the measurements (or input quantities in the case of theory) is propagated through these equations to the probability distribution of the output quantity (or quantities) of interest. The CI of this latter distribution is the practical quantity of interest.

(An aside and a "pet peeve") Note that only in the special case when the measured input quantities follow a normal distribution AND the propagating model is linear in the model parameters that the output values also follow a normal distribution. This case has been studied, taught and used by statisticians for decades. IMHO it's because the early days of the discipline were dominated by the use of linear (in the parameters) models. This concentration on linear models was because in those pre-computer days, nonlinear models were rarely used. My experience is that it's still very rare for the subject of the statistics of nonlinear models to be taught to undergraduate students.

One reasonable way to proceed in general is to assume some probability distribution function for the measured quantities (a multivariate normal distribution might be appropriate, but you could use any distribution that was appropriate for the particular measured quantities). Then, one would draw multiple samples of the set of measured quantities from this distribution and implement them in the relevant equations to obtain an output value. One would do this, say 1000 times, and then then build a histogram of the output variable(s) of interest. In this way, one "propagates" the probability distribution of the measured quantities to the output quantity (or quantities) The final reported value would be an interval (CI of 95%, 99% or whatever), or a "confidence region" in the case of multiple quantities. If the distribution is symmetric about the mean, then the mean value could also be reported as the final result.

Since the most important quantities of ANY distribution are its mean and standard deviation, one simple way to proceed (for a single quantity) is to report both the mean, and the standard deviation of the distribution and leave it at that.

This is what we usually do in our papers, for example: @article{Kelly2020a, author = {Kelly, Braden and Smith, William Robert}, title = {Alchemical hydration free-energy calculations using molecular dynamics with explicit polarization and induced polarity decoupling: an On-the-Fly polarization approach}, journal = {J. Chem. Theory Comput.}, volume = {16}, pages = {1146-1161}, year = {2020}, type = {Journal Article} }

Best regards,

William R. Smith, PhD, PEng University Professor Emeritus, Dept. of Mathematics and Statistics

Adjunct Professor, Chemical Engineering, Un. of Waterloo, Waterloo ON N2L 3G1, Canada

http://mathstat.uoguelph.ca/people/smith

http://www.uoguelph.ca/carboncapture


From: Daniel W. Siderius @.> Sent: Monday, September 13, 2021 10:43 To: dmzuckerman/Sampling-Uncertainty @.> Cc: Subscribed @.***> Subject: Re: [dmzuckerman/Sampling-Uncertainty] Vocabulary: 'Confidence Interval' should be changed. (#62)

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to @.***

I think we have to deal with this in multiple ways:

  1. update the vocabulary, where we can also give 'confidence interval' as a colloquial usage while recommending against its usage [this should address Alan's concern about education]
  2. revise its usage throughout the paper - direct substitutions are probably OK, but let's check each

I'll try to carve out time for this in October/November

Additionally, should we consider a general revision to the paper? There have been some issues that we've left lingering (me included) and it's been three years since publication.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/dmzuckerman/Sampling-Uncertainty/issues/62#issuecomment-918266083, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWPTR2U7YH34DEDLQT4FLTUBYE2XANCNFSM5DZPXOIQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dmzuckerman commented 2 years ago

Thanks, @SmithUoG , I take your point and it is important. @dwsideriusNIST I'm pretty busy the rest of this month too. If you remember, please ping me when your schedule frees up. Maybe we can handle this in one sitting??