chaidiscovery / chai-lab

Chai-1, SOTA model for biomolecular structure prediction
https://www.chaidiscovery.com
Other
1.24k stars 156 forks source link

How to fix UnicodeDecodeError? #74

Closed VoyageHSSS closed 1 month ago

VoyageHSSS commented 1 month ago

Could I ask the community for help on how to resolve this issue that occurs when I run python examples/predict_structure.py? Thank you! UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

20240923123340 20240923124144-1

20240923124159-2

arogozhnikov commented 1 month ago

looks like you have non-ascii character somewhere in your input.

VoyageHSSS commented 1 month ago

looks like you have non-ascii character somewhere in your input.

Thank you for your reply. The input file I am using is the default protein FASTA, retaining only the protein and small molecule, but the test fails. I also tried adding UTF-8 encoding in fasta_path.write_text(example_fasta), but I still encounter the same error.

VoyageHSSS commented 1 month ago

Additionally, I have changed the directory where the esm2 files are located and updated the path in esm2.py. However, I have only downloaded the single file esm2_t36_3B_UR50D.pt. Do I need to download more files? I am from China, and the default downloading method does not work for me.

arogozhnikov commented 1 month ago

fasta_path.write_text(example_fasta)

first thing to check is example_fasta.encode(encoding="ascii"). It it fails, then the problem is in input.

Do I need to download more files?

Yes, you need more files, see #29

VoyageHSSS commented 1 month ago

fasta_path.write_text(example_fasta)

first thing to check is example_fasta.encode(encoding="ascii"). It it fails, then the problem is in input.

Do I need to download more files?

Yes, you need more files, see #29

Thank you for your guidance; I will try a bit more.