bacpop / ggCaller

Bifrost graph gene caller.
MIT License
88 stars 6 forks source link

UnicodeDecodeError #3

Closed samlipworth closed 1 year ago

samlipworth commented 1 year ago

I have now got ggCaller to run which is great but when I use the --annotation fast option it crashes with the following error:

translating hits... Updating output... Number of refound genes: 274 collapse gene families with refound genes... Processing depth: 1 Iteration: 1 100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 6174/6174 [00:00<00:00, 7167.82it/s] Processing depth: 2 Iteration: 1 100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 6174/6174 [00:00<00:00, 8711.56it/s] Processing depth: 3 Iteration: 1 100%|???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 6174/6174 [00:00<00:00, 7851.30it/s] writing Roary output... writing GFF files... Traceback (most recent call last): File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/bin/ggcaller", line 33, in <module> sys.exit(load_entry_point('ggCaller==1.3.3', 'console_scripts', 'ggcaller')()) File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/site-packages/ggCaller-1.3.3-py3.9-linux-x86_64.egg/ggCaller/__main__.py", line 511, in main run_panaroo(pool, array_shd_tup, high_scoring_ORFs, high_scoring_ORF_edges, File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/site-packages/ggCaller-1.3.3-py3.9-linux-x86_64.egg/panaroo_runner/__main__.py", line 241, in run_panaroo generate_GFF(shd_arr[0], high_scoring_ORFs, input_colours, isolate_names, contig_annotation, output_dir, File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/site-packages/ggCaller-1.3.3-py3.9-linux-x86_64.egg/panaroo_runner/generate_output.py", line 323, in generate_GFF for record in SeqIO.parse(handle, "fasta"): File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/site-packages/Bio/SeqIO/Interfaces.py", line 72, in __next__ return next(self.records) File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/site-packages/Bio/SeqIO/FastaIO.py", line 238, in iterate for title, sequence in SimpleFastaParser(handle): File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/site-packages/Bio/SeqIO/FastaIO.py", line 50, in SimpleFastaParser for line in handle: File "/well/bag/users/lipworth/miniconda3/envs/ggc_env/lib/python3.9/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

samhorsfield96 commented 1 year ago

Hi, would you be able to send me the command you used to run ggCaller and your input.txt file if that’s possible, please?

samlipworth commented 1 year ago

sure - cat input.txt gives:

/well/bag/users/lipworth/gram_neg/fasta/fff281aa-dea1-4245-a28b-27523bc6e478.fasta.gz /well/bag/users/lipworth/gram_neg/fasta/fff2b139-4529-40d2-b02a-2eae3f377b1d.fasta.gz /well/bag/users/lipworth/gram_neg/fasta/fff5f6d8-f549-4502-8ac4-816632934a91.fasta.gz /well/bag/users/lipworth/gram_neg/fasta/fff721ac-bd2e-476e-8e20-11587e2280ab.fasta.gz /well/bag/users/lipworth/gram_neg/fasta/fff91cb2-e91f-4222-bf6a-88b4de1ed241.fasta.gz /well/bag/users/lipworth/gram_neg/fasta/fffcf4b2-9610-4317-809f-c6b51b6eb6df.fasta.gz

command used was: ggcaller --refs input.txt --annotation fast --save --threads 20

samhorsfield96 commented 1 year ago

It looks like the assemblies are gzipped. If you can, unzip them and try again. I’ll make a note to implement this functionality to work with gzipped files.

samhorsfield96 commented 1 year ago

@samlipworth Did this fix the issue? I’ll close if all is good.

samlipworth commented 1 year ago

Sorry I didn't reply, got caught up with clinical work. I can confirm that unzipping everything fixed things here (agree that functionality to worked with unzipped files would be v helpful and probably pretty simple to implement?). Thanks again for your work on this interesting package!