MRCIEU / opengwas-reports

Report module for IEU GWAS pipeline
1 stars 0 forks source link

Criteria for auto decision making #3

Open YiLiu6240 opened 5 years ago

YiLiu6240 commented 5 years ago

From Chris

Criteria for auto decision making

  1. Assume the allele information will be curated during the BCF file converting stage
  2. Beta column: with non-numeric beta for > 1% of the overall tested SNPs (e.g. for 2.5M SNPs, 250K of them are wrong) – flag and warning
  3. SE column: non-numeric or negative SE > 1% of the overall tested SNPs – flag and warning
  4. P column: non-numeric P value > 1% of the overall tested SNPs – flag and warning
  5. Mean of beta > 0.5 or < -0.5 – flag and warning
  6. Mean X^2 > 1.3 or < 0.7 – flag and warning
  7. Number of significant SNPs (with P < 5E -8) > 1000 – flag and waning
mightyphil2000 commented 5 years ago

hey @YiLiu6240 and @explodecomputer can I add some additional metrics (and R code for calculating) ? I'd like to add the following metrics which I think can be easily estimated in this report (X refers to genotype and Y to phenotype):

  1. MAC = number of SNPs with minor allele count <= 6
  2. N_est_sqrt = expected sample size based on median standard error, median variance of X and Y
  3. N_rep_sqrt = reported (max) sample size
  4. sd_Y_est1 = expected variance for Y inferred from sample size, variance for X and variance for beta
  5. sd_Y_est2 = expected variance for Y inferred from Z statistics and variance for X
  6. sd_Y_rep = the reported variance for Y in the study table
  7. sumar r2 statistics = sum of variance explained by SNPs (all SNPs or top 1000 SNPs?). This is inferred from variance for X, variance for Y and beta
YiLiu6240 commented 5 years ago

Hi @mightyphil2000 it will be very helpful to have the R code from you. I am still working on refining the infrastructure of the report and so would probably come back to qc metrics this weekend or next week.

For reference the current code for calculating qc metrics is https://github.com/MRCIEU/mrbase-report-module/blob/new_format/funcs/process_qc_metrics.R

The current report produced from @explodecomputer's new data.bcf is on epi-franklin /projects/MRC-IEU/research/projects/ieu2/p4/013/working/data/results/2/report_data.html

mightyphil2000 commented 5 years ago

Hi @YiLiu6240 thanks will add the scripts. Is there an example dataset in the repo I can use to test the scripts on? I couldn't fine one. Or where can I find an example dataset to test the scripts on?>

mightyphil2000 commented 5 years ago

@explodecomputer and @YiLiu6240 I've edited the formula for lambda, which I think is wrong (let me know if you disagree). In process_qc_metrics.R the formula given is: z_score <- qnorm(pval / 2) lambda <- median(z_score^2) / qchisq(0.5, 1)

But I think this should be (I have implemented this change): lambda = median(qchisq(Res$P.val, df=1, low=FALSE)) / qchisq(0.5, 1, low=FALSE)

Does Z^2 have a chi-squared distribution? If so, the formula for Z score would have to be modified so that it reflects a two sided test: z_score <- qnorm(pval / 2,lower.fail=F)

YiLiu6240 commented 5 years ago

Hi @mightyphil2000 regarding your question on example dataset, for an example run you need 3 files and the correct codebase:

Then assume you have the dependency properly installed, you need to run

Rscript prepare_refdata.R ref_data/1kg_v3_nomult.bcf

to generate a sqlite version.

Then finally to generate the report:

Rscript render_gwas_report.R --input gwas_files/2/data.bcf

depdency:

The README documentation is not up-to-date at the moment, but top level scripts are functioning with correct documentation.

mightyphil2000 commented 5 years ago

thanks @YiLiu6240 . I"m trying to update the repo but it won't allow me to push changes. I posted details in slack. Are you monitoring the slack QC channel?