OLF-Bioinformatics / VariantDetective

Identify short variants and structural variants from raw sequencing data or genomic sequences
MIT License
16 stars 1 forks source link

Unable to combine_variants --snp_vcf [UnicodeDecodeError] #12

Open musquita opened 6 months ago

musquita commented 6 months ago

We have sequenced a strain we have in the lab and wish to compare the assembled genome to the reference genome to verify possible variants. However, i am unable to combine variants from the snp_indel analysis, getting this error:

variantdetective combine_variants --snp_vcf snp_indel/freebayes/freebayes.filt.vcf snp_indel/haplotypecaller/haplotypecaller.filt.vcf snp_indel/clair3/clair3.filt.vcf --snp_consensus 2

2024-05-27 16:22:52 Starting combine variants tool
2024-05-27 16:22:52 Combining SNP VCF files...
Traceback (most recent call last):
  File "/home/cris/soft/miniconda3/envs/variantdetective/bin/variantdetective", line 33, in <module>
    sys.exit(load_entry_point('variantdetective', 'console_scripts', 'variantdetective')())
  File "/home/cris/soft/VariantDetective/variantdetective/main.py", line 39, in main
    validate_inputs(args, output=output)    
  File "/home/cris/soft/VariantDetective/variantdetective/validate_inputs.py", line 81, in validate_inputs
    combine_variants(args, vcf_lists, output=sys.stderr)
  File "/home/cris/soft/VariantDetective/variantdetective/combine_variants.py", line 59, in combine_variants
    generate_tab_csv_snp_summary(read_vcf(snp_indel_outdir + '/snp_final.vcf'), snp_indel_outdir)
  File "/home/cris/soft/VariantDetective/variantdetective/tools.py", line 81, in read_vcf
    lines = [l for l in f if not l.startswith('##')]
  File "/home/cris/soft/VariantDetective/variantdetective/tools.py", line 81, in <listcomp>
    lines = [l for l in f if not l.startswith('##')]
  File "/home/cris/soft/miniconda3/envs/variantdetective/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 5224: invalid continuation byte

The sv analysis ends successfully and I am unsure how to solve this...