Ecogenomics / CheckM

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes
https://ecogenomics.github.io/CheckM/
GNU General Public License v3.0
347 stars 73 forks source link

'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte #383

Closed tmossington closed 1 year ago

tmossington commented 1 year ago

Just finished installing checkM with all of the required packages and the reference data. When I attempt to run either

> checkm test ~/checkm_test_results

or

> checkm lineage_wf <bin folder> <output folder>

I receive the above error. ('utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte. Error: Failed to process sequence file: /home/mossingtonta/checkm/test_data/._637000110.fna).

I tried both commands with the reference data and my own fna files, but receive this error for both. I tried removing and reinstalling checkM and the associated packages, but am still encountering this issue.

tmossington commented 1 year ago

For anybody else who encounters this problem: The issue is that FASTA files should not be in utf-8 encoding format, because this allows for special characters that the program is not equipped to handle. Switching the FASTA files to ASCII format solved this error.