A Nextflow pipeline for NGS variant calling on SARS-CoV-2. From FASTQ files to normalized and annotated VCF files from GATK, BCFtools, LoFreq and iVar.
MIT License
18
stars
7
forks
source link
Produced VCFs are claimed to be malformed by IGV #57
When trying to load a VCF in IGV it gives the following error message:
The provided VCF file is malformed at approximately line number 69: The VCF specification does not allow for whitespace in the INFO field. Offending field value was "DP=29;AF=0.103448;SB=0;DP4=13,13,1,2;INDEL;HRUN=5;ANN=C|frameshift_variant|HIGH|ORF1ab|gene-GU280_gp01|transcript|TRANSCRIPT_gene-GU280_gp01|protein_coding|1/1|c.10122delT|p.S3376fs|10122/21290|10122/21290|3374/7095||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS;LOF=(ORF1ab|gene-GU280_gp01|1|1.00);CONS_HMM_SARS_COV_2=0.57215;CONS_HMM_SARBECOVIRUS=0.57215;CONS_HMM_VERTEBRATE_COV=0;PFAM_NAME=Peptidase_C30_CoV;PFAM_DESCRIPTION=Peptidase C30,coronavirus;vafator_af=0.103448;vafator_ac=3;vafator_dp=29",
Apparently, the PFAM_DESCRIPTION field does contain white spaces. A possible solution would affect both the pipeline and the processor. The pipeline would need to generate valid VCF. For instance replacing white spaces by under scores. The processor would need to replace back the under scores into white spaces when loading the data into the database. One possible problem in this implementation is that there may be other under scores in INFO fields that we don't want to replace by white spaces.
When trying to load a VCF in IGV it gives the following error message:
Apparently, the PFAM_DESCRIPTION field does contain white spaces. A possible solution would affect both the pipeline and the processor. The pipeline would need to generate valid VCF. For instance replacing white spaces by under scores. The processor would need to replace back the under scores into white spaces when loading the data into the database. One possible problem in this implementation is that there may be other under scores in INFO fields that we don't want to replace by white spaces.