IDR / idr-utils

Utility scripts for managing IDR submissions
BSD 2-Clause "Simplified" License
2 stars 6 forks source link

Unicode support for study_parser.py #36

Open dominikl opened 3 years ago

dominikl commented 3 years ago

The study_parser.py doesn't work with unicode characters. We should look into that. Hopefully won't be hard to fix, as the metadata plugin for example works fine with unicode.

For example if the study file contains an umlaut you'll get: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 2515: invalid start byte

sbesson commented 3 years ago

@dominikl agreed it would be good for the parser to just do the right thing for any incoming study file. Immediate thought would be to try and force the encoding in

https://github.com/IDR/idr-utils/blob/334968d108b0d64e55f7b9c2e128e828d455bf9b/pyidr/study_parser.py#L105

Otherwise, we could look at an example file.