Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Account for INFO with all NAs #5

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

I tried processing Kunkle2019 from Open GWAS but format_sumstats returned an empty file.

Found out this is because, while there is an INFO column, they're all NAs. In order to format_sumstats I had to edit all these NAs to 1s and rerun format_sumstats. This worked, but it'd be ideal if format_sumstatscould detect these situations, provide a warning, and then replace with 1s automatically.

More generally, there should be some reports and checks at the end of format_sumstats to ensure that there is a reasonable number SNPs in the formatted file and that the data isn't just blank. Beyond number of SNPs, reporting to the user the number of remaining SNPs with p-value<5e-8 might be nice.

bschilder commented 3 years ago

in read_vcf

 #Need to remove "AF=" at start of INFO column and replace any "." with 0
    if("INFO" %in% names(sumstats_file)){
        sumstats_file[,INFO:=gsub("^AF=","",INFO)]
        sumstats_file[INFO==".",INFO:=0]
        #update to numeric
        sumstats_file[,INFO:=as.numeric(INFO)]
    }
    if(sum(!is.na(sumstats_file$INFO))==0){
        message("WARNING: All INFO scores are NA. Replacing all with 1.")
        sumstats_file$INFO <- 1
    }