ShawHahnLab / umbra

Python package and executable for Linux for managing Illumina sequencing runs
GNU Affero General Public License v3.0
3 stars 0 forks source link

Non-unicode characters crash load_csv #102

Closed ressy closed 4 years ago

ressy commented 4 years ago

illumina.util.load_csv assumes UTF-8, but in case there happens to be, say, an ISO/IEC 8859-1 0xCA (Ê) inserted into the file for some reason it'll crash with:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 5534: invalid continuation byte

What's the "right" behavior here? Intentionally throw an exception for this? Allow these to be automatically stripped out with a warning?

ressy commented 4 years ago

Related: https://docs.python.org/3/library/codecs.html#error-handlers