FinucaneLab / fine-mapping-inf

Fine-mapping with infinitesimal effects
14 stars 3 forks source link

Possibility to apply SUSIE-inf on individual data #8

Open WeiCSong opened 10 months ago

WeiCSong commented 10 months ago

Hi, congratulation on the great work! I would like to know whether it is possible to apply SUSIE-inf on individual data, since the original SUSIE has both individual and summary version avaliable. Would this be avaliable in the near future? Thanks for your help!

cuiran commented 9 months ago

Thanks for the question! It's not on the top of our current TODO list to incorporate this functionality, considering the usage of GWAS + LD is more common as inputs to fine-mapping. But if the situation changes, we will reconsider.

To convert individual level data to summary statistics, we recommend using BOLT-LMM, SAIGE, REGENIE or PLINK2 with appropriate covariates. For LD computation, we recommend LDstore2. To get the most rigorous results, please project out the same set of covariates from the LD matrix.

snashraf commented 9 months ago

Hi Cuiran,

I didnt understand this part. "To get the most rigorous results, please project out the same set of covariates from the LD matrix." ?

Can you please explain how to accomplish this ?

Another question: Is the summary stats file need to be in a specific format, or will the BOLT-LMM summary file will be ok?

Regards, Najeeb

cuiran commented 8 months ago

RE covariate-adjustment: when performing linear regression (e.g. obtaining GWAS result on a given SNP j) with covariates, one can either directly add the covariates into the model and jointly fit the model, or equivalently, one can project the covariates out of both x and y, then perform a simple linear regression on the x_adj and y_adj (x_adj denotes the genotype matrix x after adjusting for covariates, similarly for y_adj). When fine-mapping, we assume the GWAS results are from x_adj and y_adj, so naturally if we want to incorporate LD, we need to use x_adj to compute the LD. This would be the most rigorous way to do it. However, in a somewhat homogeneous cohort, like UKBB White-British cohort, using either x or x_adj to compute LD gave very similar results. However, if there are multiple cohorts/ancestries then it makes a bigger difference, in that case using x_adj would be the recommendation.

RE input format: see the wiki tutorial for summary statistics format and LD format.