Open imneuro opened 5 years ago
@imneuro Not sure why, but there is an embedded space in that field: SOR=2. 235
I've also had a similar issue with a VCF generated from applyVQSR on gatk-4.1.3.0. Did you ever discover the reason for the whitespace?
This is going to be nearly impossible to debug without being able to reproduce it. If I could get an input VCF, a commandline, and an example bad output VCF that would go a long way: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671
Just wanted to note that, unlike vcf spec 4.2, "Space characters are allowed in values" as per spec 4.3.
While 4.3 support is on our roadmap, GATK doesn't currently support anything more recent than 4.2.
Hi GATK team,
I had error message as following with GATK4.1.0.0 on our local cluster:
Using GATK jar /dsg_cent/packages/GATK/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar Running: java1.8 -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5 g -jar /dsg_cent/packages/GATK/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar SelectVariants -R /dsgmnt/llfs2/masterdata/geno/hg38/resources_broad_hg38_v0_Homo_sapiens_assembl y38.fasta -L chr1 -V /dsgmnt/seq5_llfs/work/xhong/v4100/ApplyVQSR//ExcessHet_joint525_c1_22.SNP.VQSR.g.vcf.gz -O /dsgmnt/seq5_llfs/work/xhong/v4100/ApplyVQSR//ExcessHet_joi nt525_c1.SNP.VQSR.g.vcf.gz 09:15:49.372 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/dsg_cent/packages/GATK/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/nati ve/libgkl_compression.so 09:15:51.131 INFO SelectVariants - ------------------------------------------------------------ 09:15:51.132 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.1.0.0 09:15:51.132 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/ 09:15:51.132 INFO SelectVariants - Executing as xhong@blade5-4-11.dsg.wustl.edu on Linux v2.6.32-573.12.1.el6.x86_64 amd64 09:15:51.133 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_31-b13 09:15:51.133 INFO SelectVariants - Start Date/Time: June 27, 2019 9:15:49 AM CDT 09:15:51.133 INFO SelectVariants - ------------------------------------------------------------ 09:15:51.133 INFO SelectVariants - ------------------------------------------------------------ 09:15:51.134 INFO SelectVariants - HTSJDK Version: 2.18.2 09:15:51.134 INFO SelectVariants - Picard Version: 2.18.25 09:15:51.134 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2 09:15:51.135 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 09:15:51.135 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 09:15:51.135 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 09:15:51.135 INFO SelectVariants - Deflater: IntelDeflater 09:15:51.135 INFO SelectVariants - Inflater: IntelInflater 09:15:51.135 INFO SelectVariants - GCS max retries/reopens: 20 09:15:51.135 INFO SelectVariants - Requester pays: disabled 09:15:51.136 INFO SelectVariants - Initializing engine 09:15:52.547 INFO FeatureManager - Using codec VCFCodec to read file file:///dsgmnt/seq5_llfs/work/xhong/v4100/ApplyVQSR/ExcessHet_joint525_c1_22.SNP.VQSR.g.vcf.gz 09:15:53.171 INFO IntervalArgumentCollection - Processing 248956422 bp from intervals 09:15:53.221 INFO SelectVariants - Done initializing engine 09:15:53.390 INFO ProgressMeter - Starting traversal 09:15:53.390 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 09:15:53.479 INFO SelectVariants - Shutting down engine [June 27, 2019 9:15:53 AM CDT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.07 minutes. Runtime.totalMemory()=2131755008 htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 3433: The VCF specification does not allow for whitespace in the INFO field . Offending field value was "AC=1;AF=9.671e-04;AN=1034;AS_BaseQRankSum=-1.550;AS_FS=8.334;AS_InbreedingCoeff=-0.3147;AS_MQ=31.69;AS_MQRankSum=-0.200;AS_QD=28.73;AS_ReadPosR ankSum=nul;AS_SOR=2.235;BaseQRankSum=-1.381e+00;DP=40368;ExcessHet=160.0000;FS=8.334;InbreedingCoeff=-0.3147;MLEAC=7;MLEAF=6.770e-03;MQ=37.13;MQRankSum=0.126;QD=2.46;SOR=2. 235 GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:75,0:75:0:.:.:0,0,1525
However, from the error message I cannot see any whitespace in the INFO field.
The /dsgmnt/seq5_llfs/work/xhong/v4100/ApplyVQSR/ExcessHet_joint525_c1_22.SNP.VQSR.g.vcf.gz is the output of following command:
gatk4.1.0.0 --java-options '-Xmx100g -Xmx100g' ApplyVQSR \ -R /dsgmnt/llfs2/masterdata/geno/hg38/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta \ -V ${SNPPath}/joint525_chr1_ExcessHet_filter.SNP.g.vcf.gz \ -V ${SNPPath}/joint525_chr2_ExcessHet_filter.SNP.g.vcf.gz \ .... -V ${SNPPath}/joint525_chr22_ExcessHet_filter.SNP.g.vcf.gz \ -O /dsgmnt/seq5_llfs/work/xhong/v4100/ApplyVQSR//ExcessHet_joint525_c1_22.SNP.VQSR.g.vcf.g z \ --truth-sensitivity-filter-level 97 \ --tranches-file /dsgmnt/seq5_llfs/work/xhong/v4100/VQSR//ExcessHet_joint525_c1_22.snp.tranches \ --recal-file /dsgmnt/seq5_llfs/work/xho ng/v4100/VQSR//ExcessHet_joint525_c1_22.snp.recal \ -mode SNP
There is no error or warning in the standard error and standard output of this step.
I have tried to apply VQSR SNP model to ${SNPPath}/joint525_chr1_ExcessHet_filter.SNP.g.vcf.gz. It works well. When I select BISNPs from the output, I could not repeat the error.
I would like to get suggestion on how to narrow down the problem. Any input is appreciated.