Closed rmflight closed 6 months ago
OK, so something I'm not understanding, or maybe the idea is a bit flawed.
There is a problem here. By definition, the median is the middle of the remaining distribution. So on average, we actually do expect that the remaining non-missing values are 50 / 50 over and under the median of the non-missing values.
For example, for the Yeast dataset, I see this binomial result:
number of successes = 19017, number of trials = 38760, p-value = 0.0002308
95 percent confidence interval:
0.4856458 0.4956249
The only reason the result is not something else is just because of the huge number of values we are aggregating over. I need to test, but I suspect if I took smaller subsets of the success / failure vector, that we wouldn't be statistically significant.
The code I'm using for this test is here: https://github.com/MoseleyBioinformaticsLab/visualizationQualityControl/blob/31-statistical-test-of-left-censorship/R/left_censorship.R#L13
Just a reminder, here is the plot of the yeast dataset for number of present vs median minimum value:
Two caveats here: 1) The test needs to be one-sided. If successes is the number of non-missing values below the median, then the successes must be above a 0.5 fraction.
2) We can add an odds-ratio cutoff to ensure a certain level of left censorship is present.
On Mon, Apr 8, 2024 at 9:46 AM Robert M Flight @.***> wrote:
OK, so something I'm not understanding, or maybe the idea is a bit flawed.
There is a problem here. By definition, the median is the middle of the remaining distribution. So on average, we actually do expect that the remaining non-missing values are 50 / 50 over and under the median of the non-missing values.
For example, for the Yeast dataset, I see this binomial result:
number of successes = 19017, number of trials = 38760, p-value = 0.0002308
95 percent confidence interval: 0.4856458 0.4956249
The only reason the result is not something else is just because of the huge number of values we are aggregating over. I need to test, but I suspect if I took smaller subsets of the success / failure vector, that we wouldn't be statistically significant.
The code I'm using for this test is here: https://github.com/MoseleyBioinformaticsLab/visualizationQualityControl/blob/31-statistical-test-of-left-censorship/R/left_censorship.R#L13
Just a reminder, here is the plot of the yeast dataset for number of present vs median minimum value: Figure_2-lod-1.png (view on web) https://github.com/MoseleyBioinformaticsLab/visualizationQualityControl/assets/1509626/c2c259aa-fd79-4fa1-b787-e14b54b04eaa
— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/visualizationQualityControl/issues/31#issuecomment-2042804088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7B3D53KNVS3XEIQAY33Y4KNR7AVCNFSM6AAAAABFO2TW2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSHAYDIMBYHA . You are receiving this because you are subscribed to this thread.Message ID: <MoseleyBioinformaticsLab/visualizationQualityControl/issues/31/2042804088 @github.com>
Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093
We ended up moving this over to the ICIKendallTau package, so guess we don't need it here anymore ...
We can put a statistical test whether missingness is due to left censorship (we think) using this idea: