chr1swallace / coloc

Repo for the R package coloc
139 stars 44 forks source link

Question about sdY #102

Open maegsul opened 1 year ago

maegsul commented 1 year ago

Hi,

First of all, thanks a lot @chr1swallace again for developing coloc - it has been very useful for us to answer many questions we have. A big thanks! I have a rather theoretical/statistical question regarding the "sdY" paramater that is the population standard deviation of the trait for a quantitative trait.

The vignette here indicates that, if the study standardised their (quantitative) trait to have a variance of 1, we can set sdY to 1. However, I am curious whether this recommendation would be still valid if the study used also a set of covariates in the linear regression model predicting this already standardised quantitative trait?

To give an example: let's say we have a set of standardised gene expression values for geneX across n=100 individuals, and we map cis-eQTL variants near this gene, using covariates such as sex, age, and principal components in linear regression, such as below:

glm(standardised_geneX_expression ~ genotype + sex + age + PC1+ PC2 + PC3, data = example_data_table, family=gaussian(link="identity"))

In this case, would it be still a correct assumption to consider sdY = 1 in coloc.abf function for this eQTL dataset (and providing beta & varbeta as well in the coloc.abf function along with sdY), because the regressed out outcome variable might not have a standard deviation of 1 anymore, and betas & varbetas we obtain (and provide to coloc.abf function) are not informing directly for a standardised geneX expression anymore, but they are informing for this outcome after controlling for sex, age, and principal components?

I was just thinking about it, and I wanted to be on the safe side regarding taking into account sdY parameter correctly. What do you think?

Many thanks, Fahri

chr1swallace commented 1 year ago

Interesting question!

sdY is used to scale the prior on the effect size at causal variants. We assume beta comes from a normal distribution with mean 0, and standard deviation 0.15 * sdY. This is an arbitrary value, but seemed to fit effect sizes we had seen in early GWAS data. It's not something I've revisited since, but, if we accept this as reasonable, then I think sdY should be the sd of the trait before regressing out the other covariates, because it's about the scale on which Y is measured.


From: Fahri Küçükali @.> Sent: 03 November 2022 15:21 To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Mention @.> Subject: [chr1swallace/coloc] Question about sdY (Issue #102)

Hi,

First of all, thanks a lot @chr1swallacehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C54dd0614ff004e3ff14408dabdaf2103%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C638030857109155755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=j8hWyYt5QZEp3hMaSmCa7hV6h84vKNcVj3%2BGki7tFQ4%3D&reserved=0 again for developing coloc - it has been very useful for us to answer many questions we have. A big thanks! I have a rather theoretical/statistical question regarding the "sdY" paramater that is the population standard deviation of the trait for a quantitative trait.

The vignette here https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchr1swallace.github.io%2Fcoloc%2Farticles%2Fa02_data.html%23what-if-i-don-t-have-sdy-&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C54dd0614ff004e3ff14408dabdaf2103%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C638030857109155755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FHkJUq8HV7azRLQD%2FnYM%2BJHYIAY3cA43A1%2FRHEkANCE%3D&reserved=0 indicates that, if the study standardised their (quantitative) trait to have a variance of 1, we can set sdY to 1. However, I am curious whether this recommendation would be still valid if the study used also a set of covariates in the linear regression model predicting this already standardised quantitative trait?

To give an example: let's say we have a set of standardised gene expression values for geneX across n=100 individuals, and we map cis-eQTL variants near this gene, using covariates such as sex, age, and principal components in linear regression, such as below:

glm(standardised_geneX_expression ~ genotype + sex + age + PC1+ PC2 + PC3, data = example_data_table, family=gaussian(link="identity"))

In this case, would it be still a correct assumption to consider sdY = 1 in coloc.abf function for this eQTL dataset (and providing beta & varbeta as well in the coloc.abf function along with sdY), because the regressed out outcome variable might not have a standard deviation of 1 anymore, and betas & varbetas we obtain (and provide to coloc.abf function) are not informing directly for a standardised geneX expression anymore, but they are informing for this outcome after controlling for sex, age, and principal components?

I was just thinking about it, and I wanted to be on the safe side regarding take into account sdY parameter correctly. What do you think?

Many thanks, Fahri

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F102&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C54dd0614ff004e3ff14408dabdaf2103%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C638030857109155755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GMMi6aCLidSpajoTJHT0sZzK4QMToyGswC1XRiEu6LA%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2D6GN33HFKWVFCTDLTWGPKAXANCNFSM6AAAAAARWIEA2Q&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C54dd0614ff004e3ff14408dabdaf2103%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C638030857109155755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zgYggApYlxZiOJWJJUT44S%2BE2OdfsHsfhykH4zfbScA%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>