Open insilicool opened 1 year ago
My R code to parse the output:
library("data.table")
library("tidyverse")
library("reshape2")
message("Reading VCF...")
happy_vcf <- fread(happy_vcf_path, sep='\t', header = TRUE, skip = '#CHROM') %>%
.[,INFO:=NULL]
format_types <- unique(happy_vcf$FORMAT)
truth <- list()
query <- list()
for(i in format_types) {
message("Parsing ", i, "...")
index <- which(happy_vcf$FORMAT == i)
col_names <- str_split(i, ":")[[1]]
truth[[i]] <- colsplit(happy_vcf[index, TRUTH], ":", names=col_names) %>%
setDT %>% .[,index:=index]
query[[i]] <- colsplit(happy_vcf[index, QUERY], ":", names=col_names) %>%
setDT %>% .[,index:=index]
}
truth <- rbindlist(truth, fill = TRUE) %>% .[order(index), ]
query <- rbindlist(query, fill = TRUE) %>% .[order(index), ]
Hi,
I've been attempting to parse the vcfs generated by the --write-vcf option into R and encountering issues. I believe this has to do with the inconsistency in the number of fields in the FORMAT section.
e.g. with the CMRG truth set
module load mugqic_dev/python/2.7.13 && \ 0.3.15/hap.py \ --threads 3 --write-vcf --decompose --preprocess-truth \ --engine vcfeval \ HG002_GRCh37_CMRG_smallvar_v1.00.vcf.gz \ HG002_GRCh37_CMRG_smallvar_v1.00.vcf.gz \ -f HG002_GRCh37_CMRG_smallvar_v1.00.bed \ -r Homo_sapiens.hs37d5.fa \ --engine-vcfeval-template Homo_sapiens.hs37d5.SDF \ -o comparisons_0.3.15/HG002-cmrg-happy_0.3.15
This results in variants looking like this: bcftools query -f '%CHROM\t%POS\t%FORMAT\n' HG002-cmrg-happy_0.3.15.vcf.gz
Problematic calls: 21 47407748 GT:BD:BI:BVT:BLT:QQ 0/1:N:ti:SNP:het:. 0/1:N:ti:SNP:het:0 21 47407754 GT:BD:BI:BVT:BLT:QQ 0/1:N:i1_5:INDEL:het:. 0/1:N:i1_5:INDEL:het:0
Correctly formatted: 21 47408099 GT:BD:BK:QQ:BI:BVT:BLT 1/1:TP:gm:30:tv:SNP:homalt 1/1:TP:gm:30:tv:SNP:homalt 21 47408101 GT:BD:BK:QQ:BI:BVT:BLT 0/1:TP:gm:30:i16_plus:INDEL:het 0/1:TP:gm:30:i16_plus:INDEL:het
As you can see BK field is missing and QQ field is in a different location. Additionally, I don't see documentation explaining what a BD ="N" represents.
I first observed this in 0.3.12 but it persists in 0.3.15 as well.
Thanks for any help and insight you can provide.