Closed javier-gracia-tabuenca-tuni closed 10 months ago
@schuemie Made this change - I would defer to him on how to compute the standard deviation of a sample but, reading the reference, it looks like you're correct.
The first computes the standard deviation (SD), the latter the standard error (SE) of the sample proportion. It is a subtle difference that isn't explained well in most textbooks on standardized difference of mean. We need the SD, not the SE. The SE expresses the uncertainty around the population proportion based on the sample, and will shrink as sample size goes up, which would mean that with enough sample any difference would be considered an imbalance. Instead, the SDM simply uses the SD (the amount of variation in a variable) as a way to standardize the difference in means.
In other words, it is correct the way it is now.
Is there a reason why the standard deviation of proportion is calculated as :
p = SQRT(p(1-p))
line : https://github.com/OHDSI/CohortDiagnostics/blob/73120fe7e7736b3e3c1391a2607f4e17261cb5c5/R/CohortCharacterizationDiagnostics.R#L99C21-L99C23 previous issues: https://github.com/OHDSI/CohortDiagnostics/issues/322
Every were I have seen. it is defined as :
p = SQRT(p(1-p)/n)
For example https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Introductory_Statistics_(Shafer_and_Zhang)/06%3A_Sampling_Distributions/6.03%3A_The_Sample_Proportion