Closed Ivorforce closed 1 year ago
Is decode("unicode_escape")
somehow different from decode("iso8859-1")
? It looks to me like it does the same thing.
I think if you find a random EDF file, it's more likely to be encoded in ISO-8859-1 (or Windows-1252) than UTF-8, but it'd be nice for the wfdb package to be "locale-neutral".
Could we add 'encoding' and maybe 'errors' arguments to read_edf, to make this explicit? Maybe with ISO-8859-1 or Windows-1252 as default.
Yep, seems like iso8859-1
works too. I admit i blindly copied that encoding name from some Stackoverflow issue - eager to move forward - but reading up on it being python specific, it really doesn't make sense to use it.
I like your suggestion to make encoding a parameter. I'm updating the PR accordingly.
It looks good to me. Thanks! I don't know why the test is failing :/
Oh wait, it looks like you need to add the 'encoding' parameter to rdedfann() as well.
Right, makes sense. Fixed the issue.
Looks great!
The edf specifications notes all strings being 'ascii'.
By default,
decode()
uses the 'utf-8' encoding. In some cases, this can cause decoding issues:Arguably, the encoding
ascii
should be used. However, in my case, this didn't completely solve the problem:Using the admittedly peculiar
unicode_escape
encoding solved the problem for me. Using the verbose printing mode, the culprit can be seen (µ
):Physical Dimensions: ['', 'mV', 'mV', 'mV', 'mV', 'mV', 'mV.s', 'µV', 'mV', 'mV.s', 'µV', '', 's', '', '', 'V']
The file in question was produced by LabView. Unfortunately, I cannot share it. If need be, it might be possible to reproduce the error otherwise.