getian107 / SuSiEx

Cross-population fine-mapping
MIT License
28 stars 5 forks source link

Covariates in fine-mapping #18

Closed Jesson-mark closed 3 weeks ago

Jesson-mark commented 3 weeks ago

Hi Kai,

thanks for your tool on cross-ancestry fine-mapping!

Currently I'm trying to do cross-ancestry fine-mapping for two ancestries and both GWASs are performed with the same set of covariates. I noticed that when running fine-mapping using SuSiE the genotype and phenotype are needed to adjusted for covariates.

What I want to confirm is that whether the same step (i.e. adjust for covariates) are required to perform cross-ancestry fine-mapping using SuSiEx.

Thanks!

Jie Wang

yorkklause commented 3 weeks ago

Hi Jie,

Thank you for using our software!

You can adjust the covariates while running GWAS, and there's no need to adjust them again in SuSiEx.

My best

Kai Yuan

Jesson-mark commented 3 weeks ago

Thanks for your quick reply!

Sorry, but I should refine my question more clearly.

I notice that the LD matrix is calculated using covariates-adjusted genotype in the example of SuSiE. Do I need to also adjust covariates before calculating the in-sample LD matrix when using SuSiEx? If so, do you know any method to calculate LD adjusting for covariates using plink? The method provided by SuSiE is adjusting it in R.

Best regards,

Jie Wang

yorkklause commented 3 weeks ago

Hi Jie,

In susieR, genotype and phenotype data are used as input, so covariate adjustment is necessary during the analysis. Additionally, I don’t think LD calculation is performed in susieR.

In SuSiEx, we use GWAS summary statistics and LD as input, assuming that covariate adjustments have already been made in the GWAS. Therefore, you don’t need to adjust covariates during LD calculation.

To answer your question: no, you don’t need to adjust covariates in the LD calculation.

My best

Kai Yuan

Jesson-mark commented 3 weeks ago

Hi Kai,

thanks for your quick reply again! I agree with you that the GWAS summary statistics are already adjusted for covariates in the GWAS.

But I read an eQTL paper from the 1KGP dataset which performed fine-mapping from the eQTL associations (similar to GWAS summary statistics) and LD estimated using covariates-adjusted genotype using SuSiE.

Specifically, the procedures they used (copied from the Supplementary Information of that paper) are as below:

We then remove the effects of the eQTLmapping covariates (sex, top 5 genotype PCs, 60 PEER factors) from the inverse normal transformed TMM values and genotypes, using the procedure described in this article: https://stephenslab.github.io/susieR/articles/finemapping.html#anote-on-covariate-adjustment. Finally, we run the susie_rss function on the Z-scores from the FastQTL nominal pass, using an in-sample LD matrix calculated from the covariate adjusted genotypes and gene expression variance estimated from the covariate-adjusted expression values

So in the case of eQTL studies, maybe calculating LD using covariate adjusted genotypes is required.

However, I noticed that some GWAS papers of complex traits used LD without adjusting for covariates. I'm not sure whether the difference between GWAS and eQTL studies is due to sample size, since GWAS studies have large sample size (in the order of hundreds of thousands) than eQTL studies (in the order of thousands).

Looking forward to your opinion!

Best regards,

Jie Wang

yorkklause commented 3 weeks ago

Hi Jie,

Thank you for sharing this research! I took a quick look through their paper. Correct me if I make a mistake. It appears that when they calculated eQTLs, they combined all data and used the top five PCs to account for global ancestry. This approach makes PC adjustment necessary for genotype data.

In your analysis, if your dataset is from a single ancestry, calculating LD without adjusting for PCs should be fine. If it includes multiple ancestries, I recommend performing eQTL analyses and calculating LD separately for each ancestry, followed by fine-mapping using SuSiEx.

@getian107, do you have any thoughts on this?

My best

Kai Yuan

getian107 commented 3 weeks ago

I think in theory we should probably adjust for covariates when calculating LD. The susie/x model was derived without covariates; in the presence of covariates the model should be equivalent to regressing out covariates from both phenotypes and genotypes. That said, adjusting for covariates in LD calculation in practice can be annoying (I'm not aware of any software tools that have implemented this) and it’s not always possible to get covariates even one has in-sample LD. In most statistical genetics analyses that use LD reference panels, the common practice seems to be not to adjust. While I think the impact of adjusting for covariates may be limited, it might be helpful to do some benchmark comparisons.

Jesson-mark commented 3 weeks ago

Hi Kai,

sorry for the late reply! Thanks for your thoughts on this question!

It seems that the covariates need to be adjusted whether the study involved multiple ancestries or single ancestry, because the 1KGP eQTL study used many covariates (including sex, PCs and PEER factors).

@getian107 and thanks for your input on this question! According to my understanding, the perfect way is to calculate LD adjusting for covariates whenever possible, whether in SuSiE or SuSiEx model. I'll have a try for this.

@yorkklause @getian107 thanks again for your helpful reply. I appreciate your support very much.

Best regards,

Jie Wang