JonJala / mama

MIT License
13 stars 4 forks source link

Extreme p-values after meta-analysis #24

Closed joshchiou closed 3 years ago

joshchiou commented 3 years ago

I'm getting some strange results with my own UK biobank + Biobank Japan data (the tutorial worked fine), where there are variants with extremely low p-values. I'm not sure what's causing this - most of the variants have non-significant p-values in the marginal associations for each ancestry. Have you guys seen this before?

Not sure if these details will be relevant, but I'll include them in case they help. I used 1000 Genomes EUR/EAS to calculate LD scores using the same MAF filter (MAF>0.01). I didn't filter out the MHC or other long range LD regions. The GWAS are from a quantitative trait that (to my knowledge) was analyzed similarly between UK biobank and Biobank Japan.

Log file: mama.log

head of mama meta-analysis (EUR) after sorting by P

 SNP    CHR BP  A1  A2  FREQ    BETA    SE  Z   P   N_EFF   N_ORIG
rs11920725:C:T  3   113169205   T   C   0.039073    0.0026272058223671953   2.8306354616582172e-05  92.81328726194045   2.2984518277790884e-1873    13879708755.599121  322854
rs60828608:C:T  4   147788479   T   C   0.039812    0.005536056709544496    1.3840689886383705e-05  399.9841593872281   3.112318925539048e-34744    57489113470.94145   322854
rs60105943:A:T  4   147789894   A   T   0.960199    -0.005454598343932839   5.1060629656856645e-05  -106.82591226527055 6.881045351620396e-2481 4224473861.119365   322854
rs9459926:T:C   6   167635433   C   T   0.225287    0.00629245368705621 4.007035763216888e-05   157.0351266843839   7.050654537402111e-5358 2325357144.697124   322854
rs9459927:T:C   6   167635435   C   T   0.225217    0.006299160153178401    6.34515560587602e-06    992.7510914539237   2.5987438991367307e-214014  92756802578.69264   322854
rs78135964:G:T  11  116971677   T   G   0.056117    -0.009479357125024004   9.675941301241555e-05   -97.96831987610004  5.981669619944018e-2087 895383029.8216858   322854
rs148233183:G:A 11  116972227   A   G   0.056125    -0.009472372508468708   7.982635520376873e-05   -118.66221981811718 1.7281217983229011e-3060    1315348884.6803434  322854
rs76942203:G:A  11  116973247   A   G   0.056116    -0.00952743372223136    4.4064783591274316e-05  -216.21424061907743 1.7298392861772989e-10154   4316612838.127621   322854
rs143844152:T:G 18  56099197    G   T   0.034055    0.021455846693521154    0.00017363231754417502  123.57058292481999  1.0943789339185637e-3318    415978339.80597264  322854

UK biobank (EUR)

SNPID   CHR POS REF ALT AF  BETA    SE  PVALUE  N
rs11920725:C:T  3   113169205   C   T   0.039073    0.0106869   0.00572375  0.0618851   322854
rs60828608:C:T  4   147788479   C   T   0.039812    -0.00308798 0.00569077  0.587386    322854
rs60105943:A:T  4   147789894   A   T   0.039801    -0.00307031 0.00569111  0.589548    322854
rs9459926:T:C   6   167635433   T   C   0.225287    -0.00224245 0.0026581   0.398877    322854
rs9459927:T:C   6   167635435   T   C   0.225217    -0.00225549 0.00265863  0.396234    322854
rs78135964:G:T  11  116971677   G   T   0.056117    -0.0144076  0.00482048  0.00280064  322854
rs148233183:G:A 11  116972227   G   A   0.056125    -0.0144611  0.00482005  0.00269829  322854
rs76942203:G:A  11  116973247   G   A   0.056116    -0.014382   0.00482 0.00284697  322854
rs143844152:T:G 18  56099197    T   G   0.034055    0.0697977   0.00614004  6.14159e-30 322854

Biobank Japan (EAS)

SNPID   CHR POS REF ALT AF  BETA    SE  PVALUE  N
rs11920725:C:T  3   113169205   C   T   0.097399    0.0053049   0.00584244  3.6E-01 133471
rs60828608:C:T  4   147788479   C   T   0.097696    0.00847886  0.00584239  1.5E-01 133471
rs60105943:A:T  4   147789894   A   T   0.097658    0.00847176  0.00584273  1.5E-01 133471
rs9459926:T:C   6   167635433   T   C   0.097921    0.0107882   0.00585579  6.5E-02 133471
rs9459927:T:C   6   167635435   T   C   0.097906    0.0107961   0.00585551  6.5E-02 133471
rs78135964:G:T  11  116971677   G   T   0.097623    -0.0175361  0.00584605  2.7E-03 133471
rs148233183:G:A 11  116972227   G   A   0.097625    -0.0175292  0.00584555  2.7E-03 133471
rs76942203:G:A  11  116973247   G   A   0.097809    -0.0174265  0.0058448   2.9E-03 133471
rs143844152:T:G 18  56099197    T   G   0.098376    0.0604318   0.00584412  4.6E-25 133471
paturley commented 3 years ago

This is a bit strange. Looking at your log file, I think that there is some instability in a few of the parameters in the LD score regression step because the BBJ sample is small. We saw this as well in our UKB/BBJ applications in our paper. Can you try running your code with the following flags: --reg-se2-zero --reg-int-diag. These flags put a bit of structure on the Sigma matrix and effectively assumes that there is no sample overlap between the EUR and EAS sample, that the sample size is constant across SNPs within each ancestry, and that the biases due to population stratification are not correlated across your EUR and EAS sample. In your case, all those assumptions are pretty reasonable, I think.

On Thu, Jun 10, 2021 at 6:10 PM Josh Chiou @.***> wrote:

I'm getting some strange results with my own UK biobank + Biobank Japan data (the tutorial worked fine), where there are variants with extremely low p-values. I'm not sure what's causing this - most of the variants have non-significant p-values in the marginal associations for each ancestry. Have you guys seen this before?

Not sure if these details will be relevant, but I'll include them in case they help. I used 1000 Genomes EUR/EAS to calculate LD scores using the same MAF filter (MAF>0.01). I didn't filter out the MHC or other long range LD regions. The GWAS are from a quantitative trait that (to my knowledge) was analyzed similarly between UK biobank and Biobank Japan.

Log file: mama.log https://github.com/JonJala/mama/files/6634669/mama.log

head of mama meta-analysis (EUR) after sorting by P

SNP CHR BP A1 A2 FREQ BETA SE Z P N_EFF N_ORIG rs11920725:C:T 3 113169205 T C 0.039073 0.0026272058223671953 2.8306354616582172e-05 92.81328726194045 2.2984518277790884e-1873 13879708755.599121 322854 rs60828608:C:T 4 147788479 T C 0.039812 0.005536056709544496 1.3840689886383705e-05 399.9841593872281 3.112318925539048e-34744 57489113470.94145 322854 rs60105943:A:T 4 147789894 A T 0.960199 -0.005454598343932839 5.1060629656856645e-05 -106.82591226527055 6.881045351620396e-2481 4224473861.119365 322854 rs9459926:T:C 6 167635433 C T 0.225287 0.00629245368705621 4.007035763216888e-05 157.0351266843839 7.050654537402111e-5358 2325357144.697124 322854 rs9459927:T:C 6 167635435 C T 0.225217 0.006299160153178401 6.34515560587602e-06 992.7510914539237 2.5987438991367307e-214014 92756802578.69264 322854 rs78135964:G:T 11 116971677 T G 0.056117 -0.009479357125024004 9.675941301241555e-05 -97.96831987610004 5.981669619944018e-2087 895383029.8216858 322854 rs148233183:G:A 11 116972227 A G 0.056125 -0.009472372508468708 7.982635520376873e-05 -118.66221981811718 1.7281217983229011e-3060 1315348884.6803434 322854 rs76942203:G:A 11 116973247 A G 0.056116 -0.00952743372223136 4.4064783591274316e-05 -216.21424061907743 1.7298392861772989e-10154 4316612838.127621 322854 rs143844152:T:G 18 56099197 G T 0.034055 0.021455846693521154 0.00017363231754417502 123.57058292481999 1.0943789339185637e-3318 415978339.80597264 322854

UK biobank (EUR)

SNPID CHR POS REF ALT AF BETA SE PVALUE N rs11920725:C:T 3 113169205 C T 0.039073 0.0106869 0.00572375 0.0618851 322854 rs60828608:C:T 4 147788479 C T 0.039812 -0.00308798 0.00569077 0.587386 322854 rs60105943:A:T 4 147789894 A T 0.039801 -0.00307031 0.00569111 0.589548 322854 rs9459926:T:C 6 167635433 T C 0.225287 -0.00224245 0.0026581 0.398877 322854 rs9459927:T:C 6 167635435 T C 0.225217 -0.00225549 0.00265863 0.396234 322854 rs78135964:G:T 11 116971677 G T 0.056117 -0.0144076 0.00482048 0.00280064 322854 rs148233183:G:A 11 116972227 G A 0.056125 -0.0144611 0.00482005 0.00269829 322854 rs76942203:G:A 11 116973247 G A 0.056116 -0.014382 0.00482 0.00284697 322854 rs143844152:T:G 18 56099197 T G 0.034055 0.0697977 0.00614004 6.14159e-30 322854

Biobank Japan (EAS)

SNPID CHR POS REF ALT AF BETA SE PVALUE N rs11920725:C:T 3 113169205 C T 0.097399 0.0053049 0.00584244 3.6E-01 133471 rs60828608:C:T 4 147788479 C T 0.097696 0.00847886 0.00584239 1.5E-01 133471 rs60105943:A:T 4 147789894 A T 0.097658 0.00847176 0.00584273 1.5E-01 133471 rs9459926:T:C 6 167635433 T C 0.097921 0.0107882 0.00585579 6.5E-02 133471 rs9459927:T:C 6 167635435 T C 0.097906 0.0107961 0.00585551 6.5E-02 133471 rs78135964:G:T 11 116971677 G T 0.097623 -0.0175361 0.00584605 2.7E-03 133471 rs148233183:G:A 11 116972227 G A 0.097625 -0.0175292 0.00584555 2.7E-03 133471 rs76942203:G:A 11 116973247 G A 0.097809 -0.0174265 0.0058448 2.9E-03 133471 rs143844152:T:G 18 56099197 T G 0.098376 0.0604318 0.00584412 4.6E-25 133471

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mama/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LCYETEQKZIKC7GGR3TSEZ4XANCNFSM46PO6PHQ .

joshchiou commented 3 years ago

I tried running it with --reg-se2-zero --reg-int-diag, but now most of the variants (~4.8M in common) are filtered out due to non-positive-(semi)-definiteness of omega / sigma. There are only 53 variants left in the meta-analysis.

Log file: mama2.log

Do you mind sharing the LD score file that you used, summary statistics files, and the commands that you used? It could help me figure out whether the issue is due to improper formatting or something else on my end. Thanks!

ggoldman1 commented 3 years ago

Hi Josh,

I think it's very unlikely there's a formatting error as your data is being read in correctly and the script runs without error. Could you confirm your venv is set up prior to running? Also, here is the specification I've been using:

  python "$1" --sumstats $2 \
              --snp-list $3 \
              --ld-scores "$4" \
              --reg-int-zero \
              --input-sep "\t" \
              --out-harmonized \
              --reg-ld-set-corr 1.0 \
              --use-standardized-units \
              --replace-se-col-match "SE" \
              --add-a1-col-match "EA" \
              --add-a2-col-match "OA" \
              --out $5 | tee $6

Under the assumption that the issue is instability during LDSC regression, it might be worth trying to match your specification to mine by setting the intercept to zero (--reg-int-zero, instead of --reg-int-diag) and allowing the standard error coefficient to be freely estimated (no --reg-se2-zero). No guarantees that this will work but might be worth trying as I haven't run into this issue with the above flags.

paturley commented 3 years ago

It looks to me like your GWAS have very little power. Can you calculate the mean chi2 statistic for your summary statistics for each ancestry?

Grant's recommendation is also good if you want to try that.

joshchiou commented 3 years ago

Thanks for your help guys, @ggoldman1's suggestion seemed to do the trick. I'll go ahead and mark this as closed. @paturley the mean chi2 statistics (from the mama log file) are EAS=1.9340263512199412 and EUR=2.118730717528137. It's a pretty well-powered GWAS of a quantitative trait (along the lines of BMI).