PejLab / aFCn

Apache License 2.0
1 stars 1 forks source link

Correcting for covariates #1

Closed dtaylo95 closed 1 year ago

dtaylo95 commented 1 year ago

Hello! I am interested in calculating aFC for genes with multiple causal variants and was pointed to this tool by Stephane Castel.

This seems like a really great extension of the aFC tool! I was curious if you had recommendations for how to handle covariates (e.g. PEER factors, genotyping PCs, etc.) when using this tool. I believe in the original aFC tool, covariates that were found to be associated with expression were regressed out of the expression values and then those corrected expression counts were used when calculating aFC.

Would something similar work here (passing the covariate-corrected expression values to aFCn)?

navaehsan commented 1 year ago

Hello! I am interested in calculating aFC for genes with multiple causal variants and was pointed to this tool by Stephane Castel.

This seems like a really great extension of the aFC tool! I was curious if you had recommendations for how to handle covariates (e.g. PEER factors, genotyping PCs, etc.) when using this tool. I believe in the original aFC tool, covariates that were found to be associated with expression were regressed out of the expression values and then those corrected expression counts were used when calculating aFC.

Would something similar work here (passing the covariate-corrected expression values to aFCn)?

Hi Dylan,

Thanks for reaching out and thanks for your comment. In order to use aFCn tool for your own data you can input the normalized log-transformed covariate-corrected expression read counts. You can utilize the log-transform and normalization flags. However, please note that the tool does not perform the correction.

For the effect sizes calculated for GTEx eQTL data using aFCn tool, the expressions were corrected for significant linear effects of identified confounding factors using PEER, top 5 genotype-based principal components, sequencing platform (Illumina HiSeq 2000 or HiSeq X), sequencing protocol (PCR-based or PCR-free) and sex. The correction was done in two steps: first, we regressed the expression vector of each gene against covariates and selected those with nominally significant coefficients (p-value < 0.01). Then we regressed the expression vector on selected covariates and set the residuals as the corrected expression vector which was used for aFCn calculation. This data is now available here: https://github.com/PejLab/aFCs/tree/main/aFC-n_Oct2022

You could find the detailed information in Methods section (Haplotypic aFC estimation) here: https://www.biorxiv.org/content/10.1101/2022.01.28.478116v1.full

Please feel free to let us know if you require any additional information or if there is any specific data you need in order to apply this on your genetic analysis.

Best, Nava

dtaylo95 commented 1 year ago

Thank you so much!! This helps a ton!