ShawHahnLab / umbra

Python package and executable for Linux for managing Illumina sequencing runs
GNU Affero General Public License v3.0
3 stars 0 forks source link

Handle non-unicode CSV files #105

Closed ressy closed 4 years ago

ressy commented 4 years ago

Updates for the CSV parsing helper and ProjectData's from_alignment to handle non-unicode CSV files. Fixes #102.

Ideally all the CSV files we run into should be unicode (or equivalently a subset like ASCII), but as I'm seeing things like ISO/IEC 8859 in the wild, it's worth deciding how to handle them. This extends illumina.util.load_csv to support multiple options for handling non-unicode characters, and updates project.ProjectData.from_alignment to complain in this case but then proceed by removing the extra characters.