Different results each run

igarcia17 commented 7 months ago

Dear Zikun CARMA is becoming a fundamental tool in our genomic analyses at our lab, and I really want to thank you for both the software and the support procided via this wiki. We are planning on conducting functional studies for one of our candidate genes, and we rely on the CARMA analyses to help as focus on the most probable causal variants. In this regard, we are having some issues regarding the results: each time I run CARMA with the same parameters, I get a slightly different result. I expect the PIP values to slightly vary between runs, but in the current case there are also variations regarding the number of credible sets. These are the results for the first and second run. The Z score is the same in both cases. I find strange that credible set 3 from the first run and credible set 2 on the second run have both a PIP sum above 1, I don't know if this is ordinary or not.

SNP	P value	Z	PIP first run	CS first run	Z second run	PIP second run	CS second run
rs1329400	0.008571	-2.62707878	0.87207114	3	-2.62707878	0	0
rs62574788	0.01408	-2.4524887	0.30812058	3	-2.4524887	0	0
rs7851527	0.009533	2.59565551	0.05484275	2	2.59565551	0	0
rs62574791	0.009221	2.61227596	0.06536755	2	2.61227596	0	0
rs62574794	0.009691	2.58476904	0.87589703	2	2.58476904	0	0
rs558612	0.0002454	-3.6575607	0.99998262	1	-3.6575607	0	0
rs693370	0.01482	-2.44219577	0	0	-2.44219577	0.755355066	2
rs2671622	0.01329	-2.48056483	0	0	-2.48056483	0.626216535	2
rs655497	0.0002258	3.69588428	0	0	3.69588428	0.999999964	1

I also can tell you that rs558612 (CS 1 in first run) and rs655497 (CS 1 in second run) are in very high LD, and we think that rs655497 is indeed the causal SNP. I am using data from ADHD PGC GWAS from 2017 for the summary statistics, and the LD matrix is computed with plink 1.9 using -r for raw inter-variant allele count correlations based on the reference panel of Europeans in 1000genomes (didn't discard the finish). CARMA is run with outlier.switch= T.

Do you have any suggestion as to what can be going on? Thank you in advance. Kind regards Inés

ZikunY commented 6 months ago

Hi Ines,

Thank you for using CARMA. The problem here is associated with the data discrepancies between Z/LD. For checking if there is discrepancy, I would recommend to check the exact LD value between rs558612 and rs655497, see if the LD value is positive or negative? if it is positive, then there is the "allele-flip" problem.

I guess one of the quick fixes here is checking how to generate LD by plink, such as the command line

./plink --bfile bed.file --a1-allele SNP.txt 2 1 --r --matrix --out LD

In here "SNP.txt 2 1", using the summary statistics file of your study, and specifiy the SNP id and A1 columns of the summary statistics files, to make sure that the ref alleles are consistent between the summ and LD.

I think overall, there would be a little variation across different runs of CARMA, but the number of credible sets should remain the same and the leading variants that formulated the sets should also remain largely the same.

Please let me know if this could help. Thank you for reaching out.

Best, Zikun

igarcia17 commented 5 months ago

Hello Zikun

Thank you for your answer! Indeed it was an allele-flip case: we were using the same LD matrix with different summary statistics and the A1 and A2 doesn't match in all of the cases. Our ADHD dataset has some reference alleles that match with the reference panel but others that don't. This makes me question how is plink handling the data, as it retrieves the LD correlation of all the SNPs in the reference panel, regardless if they are present in the SNP.txt file or not. However, we are able to replicate the results in other datasets. SNPs with a high PIP value replicate with similar PIPs, and there is the same number of CS, which is great news! Our next issue regards a single SNP, that is getting a PIP of 0. This concerns us because this SNP in particular was reported as the causal SNP in the locus by using Finemap and Susie in previous studies. We are looking deeper into this. Thank you again for your answer! Inés

ZikunY commented 5 months ago

Hi Inés,

Glad that I could help here. For the part of about making sure ref/alt alleles are in the same order, please check this part of the manual on the github page of CARMA,

"Generating LD matrix using PLINK PLINK is a popular and useful toolset used for generating LD matrix from either VCF or BED file. It is important to notice that the command ``--r'' should be specified to generate r LD, instead of r^2 LD as the r^2 LD does not account for the directions (sign) of the summary statistics. Additionally, maintaining consistency in the coding order of testing alleles between summary statistics and the LD from the reference panels is also crucial for fine-mapping analysis. The following command is an example of generating LD matrix with PLINK while ensuring the coding order of testing alleles remain consistent.

./plink --bfile bed.file --a1-allele SNP.txt 2 1 '#' --r --matrix --out LD The ``SNP.txt'' file includes the information of SNP names and ref alleles from the GWAS summary statistics. Please check plink manuals for LD calculations https://www.cog-genomics.org/plink/1.9/ld#r"

I think in general, the fine-mapping analysis should be conducted on the SNPs are both presented in gwas results and reference panels to avoid poetnetial mismatches.

If a SNP is getting 0 PIP, exactly 0, it is most likely been removed as an outlier, therefore, it indicates there are outliers associated with this SNP. One possible way of moving forward is to check the outlier part provided by CARMA's result, see if there is anything weired.

Please let me know if there is anything. Cheers.

Best, Zikun

ZikunY / CARMA

Different results each run #21