Closed bschilder closed 3 years ago
in read_vcf
#Need to remove "AF=" at start of INFO column and replace any "." with 0
if("INFO" %in% names(sumstats_file)){
sumstats_file[,INFO:=gsub("^AF=","",INFO)]
sumstats_file[INFO==".",INFO:=0]
#update to numeric
sumstats_file[,INFO:=as.numeric(INFO)]
}
if(sum(!is.na(sumstats_file$INFO))==0){
message("WARNING: All INFO scores are NA. Replacing all with 1.")
sumstats_file$INFO <- 1
}
I tried processing Kunkle2019 from Open GWAS but
format_sumstats
returned an empty file.Found out this is because, while there is an INFO column, they're all NAs. In order to
format_sumstats
I had to edit all these NAs to 1s and rerunformat_sumstats
. This worked, but it'd be ideal ifformat_sumstats
could detect these situations, provide a warning, and then replace with 1s automatically.More generally, there should be some reports and checks at the end of
format_sumstats
to ensure that there is a reasonable number SNPs in the formatted file and that the data isn't just blank. Beyond number of SNPs, reporting to the user the number of remaining SNPs with p-value<5e-8 might be nice.