Closed hadware closed 5 years ago
Thanks for the feedback! It is notoriously difficult to detect the encoding. There are no reliable methods that detect all encodings. If you have a way of making an error message better for a specific encoding, I'm happy to merge a PR. Any improvement improves which is a good thing.
Hello,
I'm currently working on some a pretty inconsistent dataset of TextGrid files (with @Rachine which you might have had a contact with). I had some troubles with some files because they were encoded in
utf-16be
(and even some iniso-8859-1
), while most files where encoded inascii
. I had no idea of this inconsistency when I started to process the dataset, and although it is obviously not really your fault, I had some troubles figuring out why some TextGrid file wouldn't open with pympi.The errors I got depending on the encoding weren't even the same, for instance, while trying to open and
utf-16be
file i got anAttributeError
, whereas the iso-8859-1 files gave me aUnicodeDecodeError
.It would be nice to raise a proper error when the parsing fails because of encoding errors. I don't know if it's possible, but since i've dug into the TextGrid parsing function pretty far, I could PR a potential fix if you have an idea.