davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
703 stars 188 forks source link

UnicodeDecodeError #706

Open khoojj opened 2 years ago

khoojj commented 2 years ago

Hi! After running orthofinder with 2 .faa files as a test run, I received the following errors (please see below).

I'm not sure what's wrong. I did check out a previous thread on a similar error message but I don't think my input files are zipped.. any help appreciated!

Thanks!

OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

2022-05-30 18:04:38 : Starting OrthoFinder 2.5.4 64 thread(s) for highly parallel tasks (BLAST searches etc.) 8 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/SimpleTest.phy -o /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing

2022-05-30 18:04:38 : Creating diamond database 1 of 2 2022-05-30 18:04:38 : Creating diamond database 2 of 2

Running diamond all-versus-all

Using 64 thread(s) 2022-05-30 18:04:38 : This may take some time.... 2022-05-30 18:04:53 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2022-05-30 18:04:53 : Initial processing of each species ERROR: Blast00.txt is corrupted ERROR: Error processing files Blast0* Malformatted line in /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/Blast0_0.txt Offending line was:

Process Process-66: Traceback (most recent call last): File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, *self._kwargs) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 3: invalid continuation byte ERROR: Blast10.txt is corrupted ERROR: Error processing files Blast1* Malformatted line in /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/Blast1_0.txt Offending line was:

Process Process-67: Traceback (most recent call last): File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, self._kwargs) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 2: invalid continuation byte ERROR: An error occurred, please review the error messages*** they may contain useful information about the problem.

khoojj commented 2 years ago

The top of the blast output has many weird characters. Please see attached.. Blast1_1.txt.gz

GuoanQi1996 commented 2 years ago

Similar error. Mine is "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 1: invalid continuation byte" and "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 1: invalid start byte".

GuoanQi1996 commented 2 years ago

Switch from linux/conda to windows/docker has solved the problem. As the software works fine for most of users without this issue, I think maybe something in our server goes wrong causes this problem.

mundoctor commented 2 years ago

Hi, khooji I guess it is a problem caused by diamond, try back your diamond version to 0.9.14.

Heater233 commented 1 year ago

By the same token, how was it resolved?

Bon-jour commented 10 months ago

diamond 0.9.14 can solve this problem try yi try