OHDSI / DbDiagnostics

Package to profile a database and execute data diagnostics based on individual analysis settings
https://ohdsi.github.io/DbDiagnostics/
Apache License 2.0
6 stars 5 forks source link

Struggling with `minSampleSize` #6

Closed MaximMoinat closed 1 year ago

MaximMoinat commented 1 year ago

If I understand correctly, the minSampleSize in the Shiny app is calculated by multiplying all the individual proportions together with the total population size. This name creates the idea that this is the number of people I would at a minimum get when running the study/cohort. I think that expected- or estimatedSampleSize would be more appropriate.

For example if we define a study profile with a gender and a condition occurrence not (typically) occurring in that gender. Then, the minSampleSize might report a largish number while the combination does not occur at all.

clairblacketer commented 1 year ago

@MaximMoinat that is true, but we have to stick to using these independent statistics to try to understand if all the pieces for a study exist in a database at all. We are using minSampleSize to try to estimate a potential sample size using these independent statistics but it assumes that the values are normally distributed. It is not perfect but I think anything else would require a larger number of analyses that sites may not be willing to share.

I also hope that no one is trying to study prostate cancer in women :)

MaximMoinat commented 1 year ago

That is clear. My comment was just about the name of the variable. The actual count can be lower than the minSampleSize. The name gives the impression that when actually running the cohort/study you will get at least this number. But probably I am being pedantic, will close the issue :-)