Closed joshchiou closed 3 years ago
This is a bit strange. Looking at your log file, I think that there is some instability in a few of the parameters in the LD score regression step because the BBJ sample is small. We saw this as well in our UKB/BBJ applications in our paper. Can you try running your code with the following flags: --reg-se2-zero --reg-int-diag. These flags put a bit of structure on the Sigma matrix and effectively assumes that there is no sample overlap between the EUR and EAS sample, that the sample size is constant across SNPs within each ancestry, and that the biases due to population stratification are not correlated across your EUR and EAS sample. In your case, all those assumptions are pretty reasonable, I think.
On Thu, Jun 10, 2021 at 6:10 PM Josh Chiou @.***> wrote:
I'm getting some strange results with my own UK biobank + Biobank Japan data (the tutorial worked fine), where there are variants with extremely low p-values. I'm not sure what's causing this - most of the variants have non-significant p-values in the marginal associations for each ancestry. Have you guys seen this before?
Not sure if these details will be relevant, but I'll include them in case they help. I used 1000 Genomes EUR/EAS to calculate LD scores using the same MAF filter (MAF>0.01). I didn't filter out the MHC or other long range LD regions. The GWAS are from a quantitative trait that (to my knowledge) was analyzed similarly between UK biobank and Biobank Japan.
Log file: mama.log https://github.com/JonJala/mama/files/6634669/mama.log
head of mama meta-analysis (EUR) after sorting by P
SNP CHR BP A1 A2 FREQ BETA SE Z P N_EFF N_ORIG rs11920725:C:T 3 113169205 T C 0.039073 0.0026272058223671953 2.8306354616582172e-05 92.81328726194045 2.2984518277790884e-1873 13879708755.599121 322854 rs60828608:C:T 4 147788479 T C 0.039812 0.005536056709544496 1.3840689886383705e-05 399.9841593872281 3.112318925539048e-34744 57489113470.94145 322854 rs60105943:A:T 4 147789894 A T 0.960199 -0.005454598343932839 5.1060629656856645e-05 -106.82591226527055 6.881045351620396e-2481 4224473861.119365 322854 rs9459926:T:C 6 167635433 C T 0.225287 0.00629245368705621 4.007035763216888e-05 157.0351266843839 7.050654537402111e-5358 2325357144.697124 322854 rs9459927:T:C 6 167635435 C T 0.225217 0.006299160153178401 6.34515560587602e-06 992.7510914539237 2.5987438991367307e-214014 92756802578.69264 322854 rs78135964:G:T 11 116971677 T G 0.056117 -0.009479357125024004 9.675941301241555e-05 -97.96831987610004 5.981669619944018e-2087 895383029.8216858 322854 rs148233183:G:A 11 116972227 A G 0.056125 -0.009472372508468708 7.982635520376873e-05 -118.66221981811718 1.7281217983229011e-3060 1315348884.6803434 322854 rs76942203:G:A 11 116973247 A G 0.056116 -0.00952743372223136 4.4064783591274316e-05 -216.21424061907743 1.7298392861772989e-10154 4316612838.127621 322854 rs143844152:T:G 18 56099197 G T 0.034055 0.021455846693521154 0.00017363231754417502 123.57058292481999 1.0943789339185637e-3318 415978339.80597264 322854
UK biobank (EUR)
SNPID CHR POS REF ALT AF BETA SE PVALUE N rs11920725:C:T 3 113169205 C T 0.039073 0.0106869 0.00572375 0.0618851 322854 rs60828608:C:T 4 147788479 C T 0.039812 -0.00308798 0.00569077 0.587386 322854 rs60105943:A:T 4 147789894 A T 0.039801 -0.00307031 0.00569111 0.589548 322854 rs9459926:T:C 6 167635433 T C 0.225287 -0.00224245 0.0026581 0.398877 322854 rs9459927:T:C 6 167635435 T C 0.225217 -0.00225549 0.00265863 0.396234 322854 rs78135964:G:T 11 116971677 G T 0.056117 -0.0144076 0.00482048 0.00280064 322854 rs148233183:G:A 11 116972227 G A 0.056125 -0.0144611 0.00482005 0.00269829 322854 rs76942203:G:A 11 116973247 G A 0.056116 -0.014382 0.00482 0.00284697 322854 rs143844152:T:G 18 56099197 T G 0.034055 0.0697977 0.00614004 6.14159e-30 322854
Biobank Japan (EAS)
SNPID CHR POS REF ALT AF BETA SE PVALUE N rs11920725:C:T 3 113169205 C T 0.097399 0.0053049 0.00584244 3.6E-01 133471 rs60828608:C:T 4 147788479 C T 0.097696 0.00847886 0.00584239 1.5E-01 133471 rs60105943:A:T 4 147789894 A T 0.097658 0.00847176 0.00584273 1.5E-01 133471 rs9459926:T:C 6 167635433 T C 0.097921 0.0107882 0.00585579 6.5E-02 133471 rs9459927:T:C 6 167635435 T C 0.097906 0.0107961 0.00585551 6.5E-02 133471 rs78135964:G:T 11 116971677 G T 0.097623 -0.0175361 0.00584605 2.7E-03 133471 rs148233183:G:A 11 116972227 G A 0.097625 -0.0175292 0.00584555 2.7E-03 133471 rs76942203:G:A 11 116973247 G A 0.097809 -0.0174265 0.0058448 2.9E-03 133471 rs143844152:T:G 18 56099197 T G 0.098376 0.0604318 0.00584412 4.6E-25 133471
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mama/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LCYETEQKZIKC7GGR3TSEZ4XANCNFSM46PO6PHQ .
I tried running it with --reg-se2-zero --reg-int-diag
, but now most of the variants (~4.8M in common) are filtered out due to non-positive-(semi)-definiteness of omega / sigma. There are only 53 variants left in the meta-analysis.
Log file: mama2.log
Do you mind sharing the LD score file that you used, summary statistics files, and the commands that you used? It could help me figure out whether the issue is due to improper formatting or something else on my end. Thanks!
Hi Josh,
I think it's very unlikely there's a formatting error as your data is being read in correctly and the script runs without error. Could you confirm your venv is set up prior to running? Also, here is the specification I've been using:
python "$1" --sumstats $2 \
--snp-list $3 \
--ld-scores "$4" \
--reg-int-zero \
--input-sep "\t" \
--out-harmonized \
--reg-ld-set-corr 1.0 \
--use-standardized-units \
--replace-se-col-match "SE" \
--add-a1-col-match "EA" \
--add-a2-col-match "OA" \
--out $5 | tee $6
Under the assumption that the issue is instability during LDSC regression, it might be worth trying to match your specification to mine by setting the intercept to zero (--reg-int-zero
, instead of --reg-int-diag
) and allowing the standard error coefficient to be freely estimated (no --reg-se2-zero
). No guarantees that this will work but might be worth trying as I haven't run into this issue with the above flags.
It looks to me like your GWAS have very little power. Can you calculate the mean chi2 statistic for your summary statistics for each ancestry?
Grant's recommendation is also good if you want to try that.
Thanks for your help guys, @ggoldman1's suggestion seemed to do the trick. I'll go ahead and mark this as closed. @paturley the mean chi2 statistics (from the mama log file) are EAS=1.9340263512199412 and EUR=2.118730717528137. It's a pretty well-powered GWAS of a quantitative trait (along the lines of BMI).
I'm getting some strange results with my own UK biobank + Biobank Japan data (the tutorial worked fine), where there are variants with extremely low p-values. I'm not sure what's causing this - most of the variants have non-significant p-values in the marginal associations for each ancestry. Have you guys seen this before?
Not sure if these details will be relevant, but I'll include them in case they help. I used 1000 Genomes EUR/EAS to calculate LD scores using the same MAF filter (MAF>0.01). I didn't filter out the MHC or other long range LD regions. The GWAS are from a quantitative trait that (to my knowledge) was analyzed similarly between UK biobank and Biobank Japan.
Log file: mama.log
head
of mama meta-analysis (EUR) after sorting by PUK biobank (EUR)
Biobank Japan (EAS)