UW-GAC / GENESIS

GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
https://bioconductor.org/packages/GENESIS
34 stars 13 forks source link

Filter extreme p-values using `.Machine$double.xmin` #110

Open amstilp opened 8 months ago

amstilp commented 8 months ago

In https://github.com/UW-GAC/analysis_pipeline assoc_plots, instead of hardcoding the threshold for p-value to 38.5, use .Machine$double.xmin to calculate the threshold that can be used. The current code is:

if ("stat" %in% names(assoc)) {
    extreme <- abs(assoc$stat) > 38.5
    if (any(extreme)) assoc$pval[extreme] <- 5e-324
}

5e-324 was probably chosen based on this information from ?.Machine:

     Note that on most platforms smaller positive values than
     ‘.Machine$double.xmin’ can occur.  On a typical R platform the
     smallest positive double is about ‘5e-324’.

but it's most reliable to use the value in .Machine. The code would then change to:

if ("stat" %in% names(assoc)) {
    extreme <- assoc$stat^2 > qchisq(.Machine$double.xmin, df=1, lower.tail=F)
    if (any(extreme)) assoc$pval[extreme] <- .Machine$double.xmin
}

A p-value of 5e-324 and eg 2.225074e-308 (from .Machine$double.xmin on my machine) are basically the same, anyway.

amstilp commented 8 months ago

Instead of making this change in analysis_pipeline, better to make it in GENESIS. Any function calling pchisq should then filter extremely low p-values using the above code.