IgDAWG / BIGDAWG

2 stars 0 forks source link

Inconsistent confidence intervals and P-values #3

Open BenSolomon opened 2 years ago

BenSolomon commented 2 years ago

Greetings and thank you for an excellent tool!

In one of my recent analyses, I have noticed that several alleles that are identified as disease associated by significant p-values (<0.05) also have confidence intervals that would suggest they are not significant (interval includes 1).

I've attached an example data (hla_table.csv) set that I use with the following code

library(dplyr)
library(BIGDAWG)
hla_table <- read.csv("hla_table.csv", check.names = F)
bd_df <- BIGDAWG(as.data.frame(hla_table), Return = T, Output = F, Trim = T, Res = 2, Run.Tests = "L")
as.tibble(bd_df$L$Set1$OR) %>% 
  mutate_all(unlist) %>% 
  filter(sig == "*") 

The output I get from this is

Locus | Allele | OR | CI.lower | CI.upper | p.value | sig -- | -- | -- | -- | -- | -- | -- A | 24:02 | 3.5 | 0.91 | 16.2 | 0.033732 | * C | 03:04 | 0.14 | 0 | 1.02 | 0.03272 | * C | 07:01 | 0.22 | 0.02 | 1.01 | 0.033208 | * DPA1 | 03:01 | 0.45 | 0.2 | 0.97 | 0.028747 | * DQA1 | 03:03 | 0.37 | 0.11 | 1.04 | 0.043133 | *

As you can see, for example, A-24:02 has a p-value of 0.0337, consistent with significance, but the odds ratio confidence interval ranges from 0.91 - 16.2, which would be considered not significant. The only allele from this example where CI and p-value agree is DPA1*03:01.

I plan to look further into how the function is calculating the CIs and P-values but wanted to check in case there is an obvious answer.

Thanks!