OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
40 stars 45 forks source link

Is Standard deviation of proportion correct ?? #1068

Closed javier-gracia-tabuenca-tuni closed 10 months ago

javier-gracia-tabuenca-tuni commented 10 months ago

Is there a reason why the standard deviation of proportion is calculated as :

p = SQRT(p(1-p))

line : https://github.com/OHDSI/CohortDiagnostics/blob/73120fe7e7736b3e3c1391a2607f4e17261cb5c5/R/CohortCharacterizationDiagnostics.R#L99C21-L99C23 previous issues: https://github.com/OHDSI/CohortDiagnostics/issues/322

Every were I have seen. it is defined as :

p = SQRT(p(1-p)/n)

For example https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Introductory_Statistics_(Shafer_and_Zhang)/06%3A_Sampling_Distributions/6.03%3A_The_Sample_Proportion

azimov commented 10 months ago

@schuemie Made this change - I would defer to him on how to compute the standard deviation of a sample but, reading the reference, it looks like you're correct.

schuemie commented 10 months ago

The first computes the standard deviation (SD), the latter the standard error (SE) of the sample proportion. It is a subtle difference that isn't explained well in most textbooks on standardized difference of mean. We need the SD, not the SE. The SE expresses the uncertainty around the population proportion based on the sample, and will shrink as sample size goes up, which would mean that with enough sample any difference would be considered an imbalance. Instead, the SDM simply uses the SD (the amount of variation in a variable) as a way to standardize the difference in means.

In other words, it is correct the way it is now.