FINNGEN / autoreporting

MIT License
0 stars 1 forks source link

incorrect credible variants #115

Closed Fedja closed 4 years ago

Fedja commented 4 years ago

Ran autoreporting and looking at this report gs://fg-cromwell/autoreporting/ede8c84e-b71c-45af-8d03-5fa2b9078e08/call-report/shard-8/glob-d7a6a932aaa9bb3e161d9330b7c3c2fd/H7_AMD.top.out

For region chr1:195202087-198202087 there are two CSs and report says one of the loci is lead by chr1_196677257_T_C and report gives it a PIP of 0.00902.

However in the credible set there are much better variants.

gsutil cat gs://r4_data_west1/demopheno/finemap_6_5_20_no_purity/H7_AMD.SUSIE.snp.bgz $f | zcat | tail -n +2 |sort -k 1,1 -k 2,2 -grk 15,15 | awk 'BEGIN{ added[1]=1} (! ($1
$2$NF in added))&&$NF!="-1"{ print $0; added[$1$2$NF]=1}' | awk 'BEGIN {OFS="\t"}{ print $1$2$16,$0}' | sort -b -k 1,1
H7_AMDchr10:120952080-1239520801        H7_AMD  chr10:120952080-123952080       10:122452080:C:T        chr10_122452080_C_T     chr10   122452080       C       T       0.2435  0.835   0.0361536       5.094e-118      0.0685497839134475   0.22921588875075        0.0822498471803728      1
H7_AMDchr1:195202087-1982020871 H7_AMD  chr1:195202087-198202087        1:196702087:G:A chr1_196702087_G_A      chr1    196702087       G       A       0.4375  -0.613  0.0298217       6.876e-94       -0.0491397219587706 0.17659912000134 0.0719991471757099      1
H7_AMDchr1:195202087-1982020872 H7_AMD  chr1:195202087-198202087        1:196651787:C:T chr1_196651787_C_T      chr1    196651787       C       T       0.1957  -0.0222 0.0369255       0.5477  0.212443059950299       0.106899360910301    0.81384830188343        2
H7_AMDchr4:108240713-1112407131 H7_AMD  chr4:108240713-111240713        4:109740713:T:A chr4_109740713_T_A      chr4    109740713       T       A       0.01136 0.886   0.154741        1.03e-08        0.734152610880591   0.360553832117216        0.830301070453376       1
H7_AMDchr7:103453266-1064532661 H7_AMD  chr7:103453266-106453266        7:104953266:A:T chr7_104953266_A_T      chr7    104953266       A       T       0.3595  0.1702  0.0304449       2.265e-08       0.0570596578038637  0.0821387179720904       0.335932944257313       1
Lipastomies commented 4 years ago

It seems that the true credible set top variant, chr1_196651787_C_T, has a p-value of 0.5477, and it is filtered away in the credible set grouping (due to the 0.01 alt sign threshold), which is why it's not present in the credible set. In fact, if you look at the report, one can see that the credible set ID is chr1_196651787_C_T_2, which is the correct lead PIP variant in the second cs in that region.

This should be fixed when #103 is merged to master, as it removes filtering in credset grouping's case. However, I think the simple/ld grouping would also benefit from having all of the credible set variants in the results, so that needs to be fixed.

Fedja commented 4 years ago

ah yea indeed. This is one of those cases when a variant is not significant until conditioned on other variants. Can fix by setting proper threshold...