cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium
https://cumc.github.io/xqtl-protocol/
MIT License
41 stars 43 forks source link

Inconsistency between sesame and minify output #455

Open yuqimiao opened 2 years ago

yuqimiao commented 2 years ago

Code for consistency check on cg00000957 measure

sesame = read_delim("/mnt/vast/hpc/csg/xqtl_workflow_testing/methyl/sesame_2/ROSMAP_arrayMethylation_covariates.sesame.methyl.beta.bed.gz")
orig = read_delim("/mnt/mfs/ctcn/datasets/rosmap/methylation/dlpfcTissue/batch0_1/ill450kMeth_all_740_imputed.txt","\t")

co = orig[orig$TargetID == "cg00000957",2:ncol(orig)]
cs = sesame[sesame$ID == "cg00000957",5:ncol(sesame)]
common_sample = intersect(colnames(cs), colnames(co))

x = unlist(co[, common_sample])
y = unlist(cs[,common_sample])
fit = lm(x~y)

> summary(fit)

Call:
lm(formula = x ~ y)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.153084 -0.036755  0.003753  0.038915  0.115014 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.75461    0.02936  25.698   <2e-16 ***
y            0.03046    0.03652   0.834    0.404    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05347 on 733 degrees of freedom
Multiple R-squared:  0.0009483, Adjusted R-squared:  -0.0004147 
F-statistic: 0.6958 on 1 and 733 DF,  p-value: 0.4045

**The minify cpg site beta measure is greatly different from the sesame cpg site beta measure**

I choose cg00000957 since it is a cpg cite with low-pv mQTL in the original Ng2017xQTL paper.

> ng2017 = read.table("mQTLs.txt",sep = "\t", header = T)
> ng2017 = as_tibble(ng2017) %>% 
+   select(snp = SNPid, cpg = featureName, chrom = featureChromosome, pv = pValue) %>% 
+   filter(chrom == 1)
> ng2017 %>%
+   filter(cpg == "cg00000957")
# A tibble: 24 x 4
   snp        cpg        chrom       pv
   <chr>      <chr>      <int>    <dbl>
 1 rs472651   cg00000957     1 5.40e-10
 2 rs905469   cg00000957     1 4.46e- 9
 3 rs905468   cg00000957     1 4.52e-10
 4 rs868162   cg00000957     1 0       
 5 rs868163   cg00000957     1 0       
 6 rs1287638  cg00000957     1 0       
 7 rs1287637  cg00000957     1 1.73e-27
 8 rs963030   cg00000957     1 9.29e-26
 9 rs959240   cg00000957     1 1.53e-10
10 rs74051398 cg00000957     1 1.55e-19
# ... with 14 more rows
# i Use `print(n = ...)` to see more rows

Also tried other CpG sites that are extremely significant in Ng2017xQTL,

pv0_cpg = ng2017 %>%
  filter(pv == 0) %>% 
  pull(cpg) %>% 
  unique()

pv_ls = lapply(1:100, function(i){
  cpg_cur = pv0_cpg[i]
  if(sum(sesame$ID == cpg_cur)>0 & sum(orig$TargetID == cpg_cur)>0){
    co = orig[orig$TargetID == cpg_cur,2:ncol(orig)]
    cs = sesame[sesame$ID == cpg_cur,5:ncol(sesame)]
    common_sample = intersect(colnames(cs), colnames(co))
    length(common_sample)
    x = unlist(co[, common_sample])
    y = unlist(cs[,common_sample])
    fit = lm(x~y)
    return(summary(fit)$coefficient[2,4])}
  })

> unlist(pv_ls)
 [1] 0.40448496 0.69315317 0.59829265 0.34724045 0.69855036 0.92839758
 [7] 0.50203576 0.10305671 0.45365417 0.77470680 0.85518216 0.58873877
[13] 0.68891363 0.06954018 0.78708769 0.06614904 0.75490330 0.06181439
[19] 0.31558706 0.11438481 0.84752092 0.96277210 0.76460106 0.65199153
[25] 0.21135437 0.90013705 0.56915821 0.03339486 0.25461526 0.82932243
[31] 0.10573486 0.85928725 0.33020577 0.57576934 0.10030733 0.31164285
[37] 0.48533727 0.86046229 0.31500568 0.04570502 0.59895831 0.30760698
[43] 0.65928816 0.33586481 0.85167833 0.21475559 0.23909611 0.36575975
[49] 0.14205368 0.57101327 0.32817938 0.40220618 0.08928881 0.34263965
[55] 0.53213743 0.73360923 0.99265958 0.73450408 0.04820177 0.28810319
[61] 0.34407583 0.81380035 0.28648707 0.92648308 0.38581996 0.75895828
[67] 0.34588438 0.18918972 0.29515740 0.23112783 0.74010242 0.52439654
[73] 0.67292825 0.10333567 0.47408378 0.16103651 0.09179468 0.27536787
[79] 0.39442347 0.05377568 0.91078583 0.14570576 0.60953592 0.65440993
[85] 0.93828940 0.78125167 0.84156471 0.36071270 0.33963062 0.81857826
[91] 0.07721415 0.39818933
gaow commented 2 years ago

@yuqimiao thank you! I was under the impression from @hsun3163 investigation here that minfi and sesame output are very similar, see cell 219 on this notebook https://github.com/cumc/brain-xqtl-analysis/blob/main/analysis/Wang_Columbia/mqtl/sesame_vs_minfi.ipynb

I will take a closer look tomorrow after I am done with some teaching duties but in the meantime @hsun3163 could you double check? Your notebook doesn't have enough narrative so I am not sure if I interpreted your message correctly. Thanks!