artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
112 stars 68 forks source link

artic-tools check_vcf Segmentation fault on empty VCF files #88

Open rdeborja opened 3 years ago

rdeborja commented 3 years ago

Issue:

artic-tools check_vcf produces a segmentation fault when VCF files have no mutations (i.e. negative control samples).

Version:

$ artic --version
artic 1.3.0

Steps to reproduce:

Ran artic-tools check_vcf --summaryOut summary.out sample.merged.vcf.gz SARS-CoV-2.scheme.bed sample.merged.vcf.gz has a VCF header but no mutations in the file

##fileformat=VCFv4.2
##nanopolish_window=MN908947.3:1-29902
##INFO=<ID=TotalReads,Number=1,Type=Integer,Description="The number of event-space reads used to call the variant">
##INFO=<ID=SupportFraction,Number=1,Type=Float,Description="The fraction of event-space reads that support the variant">
##INFO=<ID=SupportFractionByStrand,Number=2,Type=Float,Description="Fraction of event-space reads that support the variant for each strand">
##INFO=<ID=BaseCalledReadsWithVariant,Number=1,Type=Integer,Description="The number of base-space reads that support the variant">
##INFO=<ID=BaseCalledFraction,Number=1,Type=Float,Description="The fraction of base-space reads that support the variant">
##INFO=<ID=AlleleCount,Number=1,Type=Integer,Description="The inferred number of copies of the allele">
##INFO=<ID=StrandSupport,Number=4,Type=Integer,Description="Number of reads supporting the REF and ALT allele, by strand">
##INFO=<ID=StrandFisherTest,Number=1,Type=Integer,Description="Strand bias fisher test">
##INFO=<ID=SOR,Number=1,Type=Float,Description="StrandOddsRatio test from GATK">
##INFO=<ID=RefContext,Number=1,Type=String,Description="The reference sequence context surrounding the variant call">
##INFO=<ID=Pool,Number=1,Type=String,Description="The pool name">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample

Log output:

[11:51:06] [artic-tools::check_vcf] starting VCF checker
[11:51:06] [artic-tools::check_vcf] reading scheme
[11:51:06] [artic-tools::check_vcf] collecting scheme stats
[11:51:06] [artic-tools::check_vcf]     primer scheme file: SARS-CoV-2.scheme.bed
[11:51:06] [artic-tools::check_vcf]     reference sequence: MN908947.3
[11:51:06] [artic-tools::check_vcf]     number of pools:    2
[11:51:06] [artic-tools::check_vcf]     number of primers:  198 (includes 0 alts)
[11:51:06] [artic-tools::check_vcf]     minimum primer size:    20
[11:51:06] [artic-tools::check_vcf]     maximum primer size:    31
[11:51:06] [artic-tools::check_vcf]     number of amplicons:    99
[11:51:06] [artic-tools::check_vcf]     mean amplicon size: 356
[11:51:06] [artic-tools::check_vcf]     maximum amplicon size:  373
[11:51:06] [artic-tools::check_vcf]     scheme ref. span:   25-29854
[11:51:06] [artic-tools::check_vcf]     scheme overlaps:    18.39485%
[11:51:06] [artic-tools::check_vcf] setting parameters
[11:51:06] [artic-tools::check_vcf]     output report: /tmp/summary.out
[11:51:06] [artic-tools::check_vcf]     filtering variants: false
[11:51:06] [artic-tools::check_vcf]     minimum quality threshold: 10.0
[11:51:06] [artic-tools::check_vcf] reading VCF file
Segmentation fault
jts commented 3 years ago

I opened a PR in artic-tools to fix this (linked above) - would be good to get this merged in and new conda versions built

will-rowe commented 3 years ago

Thanks @jts and sorry for the delay! Looks good to me. I'll make a new artic-tools release and bump the conda version.

rdeborja commented 3 years ago

@will-rowe thanks for merging and updating the release version. Will the conda version be updated as well (currently at 0.3.0).

will-rowe commented 3 years ago

I'll get on this now - I had hoped for an automatic bump but nm!

will-rowe commented 3 years ago

Sorry for the delay here - some issues getting it into conda. I'm quite busy atm so will have to look at this over the weekend/next week.