HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Support compressed genomes #234

Open mbhall88 opened 11 months ago

mbhall88 commented 11 months ago

If I pass a compressed fasta file to --genome I get the following error

$ iss generate -m miseq -g ref.fa.gz -n 50k -o iss_reads
INFO:iss.app:Starting iss generate
INFO:iss.app:Using kde ErrorModel
INFO:iss.util:Stitching input files together
Traceback (most recent call last):
  File "/home/mihall/sw/mambaforge/envs/classbench/bin/iss", line 10, in <module>
    sys.exit(main())
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/site-packages/iss/app.py", line 608, in main
    args.func(args)
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/site-packages/iss/app.py", line 128, in generate_reads
    genome_list = util.count_records(f)
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/site-packages/iss/util.py", line 82, in count_records
    for record in SeqIO.parse(fasta_file, "fasta"):
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 72, in __next__
    return next(self.records)
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/site-packages/Bio/SeqIO/FastaIO.py", line 238, in iterate
    for title, sequence in SimpleFastaParser(handle):
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/site-packages/Bio/SeqIO/FastaIO.py", line 50, in SimpleFastaParser
    for line in handle:
  File "/home/mihall/sw/mambaforge/envs/classbench/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
HadrienG commented 10 months ago

Hi!

Thank you for the suggestion. I will implement this in a future release