Criteria for auto decision making

YiLiu6240 commented 5 years ago

From Chris

Criteria for auto decision making

Assume the allele information will be curated during the BCF file converting stage

Beta column: with non-numeric beta for > 1% of the overall tested SNPs (e.g. for 2.5M SNPs, 250K of them are wrong) – flag and warning

SE column: non-numeric or negative SE > 1% of the overall tested SNPs – flag and warning

P column: non-numeric P value > 1% of the overall tested SNPs – flag and warning

Mean of beta > 0.5 or < -0.5 – flag and warning

Mean X^2 > 1.3 or < 0.7 – flag and warning

Number of significant SNPs (with P < 5E -8) > 1000 – flag and waning

mightyphil2000 commented 5 years ago

hey @YiLiu6240 and @explodecomputer can I add some additional metrics (and R code for calculating) ? I'd like to add the following metrics which I think can be easily estimated in this report (X refers to genotype and Y to phenotype):

MAC = number of SNPs with minor allele count <= 6
N_est_sqrt = expected sample size based on median standard error, median variance of X and Y
N_rep_sqrt = reported (max) sample size
sd_Y_est1 = expected variance for Y inferred from sample size, variance for X and variance for beta
sd_Y_est2 = expected variance for Y inferred from Z statistics and variance for X
sd_Y_rep = the reported variance for Y in the study table
sumar r2 statistics = sum of variance explained by SNPs (all SNPs or top 1000 SNPs?). This is inferred from variance for X, variance for Y and beta

YiLiu6240 commented 5 years ago

Hi @mightyphil2000 it will be very helpful to have the R code from you. I am still working on refining the infrastructure of the report and so would probably come back to qc metrics this weekend or next week.

For reference the current code for calculating qc metrics is https://github.com/MRCIEU/mrbase-report-module/blob/new_format/funcs/process_qc_metrics.R

The current report produced from @explodecomputer's new data.bcf is on epi-franklin /projects/MRC-IEU/research/projects/ieu2/p4/013/working/data/results/2/report_data.html

mightyphil2000 commented 5 years ago

Hi @YiLiu6240 thanks will add the scripts. Is there an example dataset in the repo I can use to test the scripts on? I couldn't fine one. Or where can I find an example dataset to test the scripts on?>

mightyphil2000 commented 5 years ago

@explodecomputer and @YiLiu6240 I've edited the formula for lambda, which I think is wrong (let me know if you disagree). In process_qc_metrics.R the formula given is: z_score <- qnorm(pval / 2) lambda <- median(z_score^2) / qchisq(0.5, 1)

But I think this should be (I have implemented this change): lambda = median(qchisq(Res$P.val, df=1, low=FALSE)) / qchisq(0.5, 1, low=FALSE)

Does Z^2 have a chi-squared distribution? If so, the formula for Z score would have to be modified so that it reflects a two sided test: z_score <- qnorm(pval / 2,lower.fail=F)

YiLiu6240 commented 5 years ago

Hi @mightyphil2000 regarding your question on example dataset, for an example run you need 3 files and the correct codebase:

the code at dev branch: git clone https://github.com/MRCIEU/mrbase-report-module; cd mrbase-report-module; git checkout dev
place epi-franklin: /projects/MRC-IEU/research/projects/ieu2/p4/013/working/data/results/2/data.bcf under your working directory ./gwas-files/2/
place epi-franklin: /projects/MRC-IEU/research/projects/ieu2/p4/013/working/data/results/2/data.bcf.csi under your working directory ./gwas-files/2/
place bc4 /mnt/storage/private/mrcieu/research/mr-eve/vcf-reference-datasets/1000g/1kg_v3_nomult.bcf under ./ref_data/

Then assume you have the dependency properly installed, you need to run

Rscript prepare_refdata.R ref_data/1kg_v3_nomult.bcf

to generate a sqlite version.

Then finally to generate the report:

Rscript render_gwas_report.R --input gwas_files/2/data.bcf

depdency:

You need a functioning bcftools in your path.
The R dependency is managed by packrat, and the first time you open an R console under the root directory you need to run packrat::restore() to install dependency. This will take quite some time.
The above are the required steps for your local machine. For epi-franklin you will also need to setup conda (miniconda / anaconda) properly, and if you do need to run the code on epi-franklin let me know.

The README documentation is not up-to-date at the moment, but top level scripts are functioning with correct documentation.

mightyphil2000 commented 5 years ago

thanks @YiLiu6240 . I"m trying to update the repo but it won't allow me to push changes. I posted details in slack. Are you monitoring the slack QC channel?

MRCIEU / opengwas-reports

Criteria for auto decision making #3