Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Report how many SNPs you started with. #32

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

Adding a report at the very beginning of format_sumstats so we know how much the dataset gets whittled down by the end.

  sumstats_return[["sumstats_dt"]] <- read_sumstats(path = path, 
                                                      nThread = nThread)

    report_summary(sumstats_dt = sumstats_return$sumstats_dt)
    orig_dims <- dim(sumstats_return$sumstats_dt)

Also adding at the very end, as an optional arg to thereport_summary function.

report_summary <- function(sumstats_dt,
                           orig_dims=NULL){
    orig_dims_report <- if(!is.null(orig_dims)){paste0(" (started with ",orig_dims[1],")")} else {NULL};

    message("Formatted summary statistics report:",
            "\n   - ", formatC(nrow(sumstats_dt), big.mark = ",")," rows",orig_dims_report,
            "\n   - ", formatC(length(unique(sumstats_dt$SNP)),big.mark = ",")," unique variants",
            "\n   - ", formatC(nrow(subset(sumstats_dt, P<5e-8)),big.mark = ",")," genome-wide significant variants (P<5e-8)",
            "\n   - ", formatC(length(unique(sumstats_dt$CHR)),big.mark = ",")," chromosomes"
            ) 
}