Non-unicode characters crash load_csv

ShawHahnLab / umbra

Python package and executable for Linux for managing Illumina sequencing runs

GNU Affero General Public License v3.0

3 stars 0 forks source link

Non-unicode characters crash load_csv #102

Closed ressy closed 4 years ago

ressy commented 4 years ago

illumina.util.load_csv assumes UTF-8, but in case there happens to be, say, an ISO/IEC 8859-1 0xCA (Ê) inserted into the file for some reason it'll crash with:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 5534: invalid continuation byte

What's the "right" behavior here? Intentionally throw an exception for this? Allow these to be automatically stripped out with a warning?

ressy commented 4 years ago