Open krz opened 5 years ago
Hey some suggestions below:
Step 1. Generate the summary data for the 27 SNPs' effects on coffee consumption. Standard GWAS practice involves running a regression of coffee_consumption ~ rs001 + pc1 + pc2 + ... + pc10 + other covariates
for the first SNP and recording the beta for the SNP, standard error, p-value, effect allele*, other allele, sample size. The generated summary data will look like
SNP effect_allele other_allele beta se pval n
rs001 A G 0.1 0.001 1e-5 10000
rs002 ...
Normally this is done using software like plink, boltlmm, GenABEL, SNPTEST, depending on the data and the circumstances of the samples etc. You can run it manually in R as described above also. The point is to get valid estimates of the SNP effects on the phenotype
Step 2. Read your generated summary data into R using the TwoSampleMR read_outcome_data
function and follow the steps from there https://mrcieu.github.io/TwoSampleMR/
Thanks. So it is required that I perform my own GWAS, or at least have the 10 principal components for all SNPs. Only having the 27 dialletic SNPs is not enough?
From the mr.raps
package description it seems that I only require the beta
and se
values for exposure and treatment, nothing else
This function calls mr.raps.shrinkage with overdispersion = TRUE, loss.function = "huber". The input data frame should contain the following variables:
beta.exposure beta.outcome se.exposure se.outcome
But I see that I have to include the principal components at least as you said.
Yeah it's good practice to control for confounding due to population structure when running any genetic association analysis - if you don't have genome wide data or the actual derived genetic principle components then there's not much you can do I guess, but there is a risk that your effect estimates will be inflated.
You don't technically need the effect allele to do the mr.raps analysis because it assumes that the beta for the exposure and the beta for the outcome are in terms of the same effect allele. The TwoSampleMR package doesn't use the effect allele information to do the statistical analysis, just to harmonise the exposure and outcome datasets prior to the analysis. Assuming knowledge of the effect allele is typically a bit dangerous as every study uses a different way of coding the alleles. e.g. see https://academic.oup.com/ije/article/45/6/1717/3072174 for examples of where failing to correctly harmonise led to the wrong answers being published.
@explodecomputer Hello sir ! i am having problems with the steps prior to data harmonization can you help me please? l have two file from the merging step now what shall i do next? i have not performed clumping as well.
merging step code: sleep_durations_p_select_IL2bout_merge1 <- merge(IL2_allSNP,sleep_durations_p_select,by=c("effect_allele","other_allele" ,"CHR","BP"), suffixes = c (".sleep" , ".IL2b")) nrow(sleep_durations_p_select_IL2bout_merge1)
flip_sleep_durations_p_select <- sleep_durations_p_select setnames(flip_sleep_durations_p_select, c("effect_allele","other_allele"), c ("other_allele","effect_allele" ))
flip_sleep_durations_p_select <--flip_sleep_durations_p_select
sleep_durations_p_select_IL2bout_merge2 <- merge(IL2_allSNP,flip_sleep_durations_p_select, by=c("effect_allele","other_allele" ,"CHR","BP"), suffixes = c (".sleep" , ".IL2b")) nrow(sleep_durations_p_select_IL2bout_merge2)
Hi,
Assume I want to do a hypothetical analysis for the effect of BMI on coffee consumption The first step, obtaining the exposure data, is easy. According to the documentation, I do
bmi_gwas <- subset(gwas_catalog, grepl("Speliotes", Author) & Phenotype == "Body mass index")
We find 27 SNPs associated with BMI according to this study. Assume they are called
rs001
-rs027
.The second step, getting the outcome, is more difficult. There is no GWAS for coffee consumption. However, I have a local data set with 10,000 individuals, where coffee consumption and the 27 SNPs are available:
Using this data set, what is the easiest way of obtaining the relevant outcome data frame that I can use for two-sample MR? I can see obtaining
effect.outcome
,se.outcome
, and evenpval.outcome
from univariate regressions of the SNPs againstcoffee_consumption
but I don't know about the other variables likea1
. Are they really necessary? Is there a method or package that inputs my data and automatically outputs a data frame like the one thatTwoSampleMR
expects?Thanks