Open InvncibiltyCloak opened 10 months ago
Thanks for your comments! I'm glad you found it easy to override the encoding.
I cannot find the encoding documented anywhere. The default was set to ISO-8859-1 a long time ago, probably due to an observation like yours. It may have evolved since then. The fact that your Windows machine seems to be recording in UTF-8 seems to be good reason to change the assumed default to UTF-8.
Thanks @InvncibiltyCloak for bringing this up. Changing the default encoding to UTF-8 seems reasonable. One consideration though would be to give users the option to explicitly set encodings to maintain backwards compatibility with other encodings, e.g. ISO-8859-1, in older files and with older DEWE stacks?
I never had a good example to test the encoding so it is intentionally very easy for the user to specify:
import dewesoft as dw
dw.encoding='utf-8'
Unfortunately, the Dewesoft library sometimes appends junk characters to the end of strings which cause utf-8 decoding errors in python and fail the tests. If we change the default to utf-8 then we need to either ask Dewesoft fix their library or have python ignore these decoding errors.
Ah I should have been more specific. I saw this global option, but wondered if all of the 10 or so usages of it should all use the same encoding, e.g. opening the file in
vs decoding text values e.g. in
But it was only guessing on my part without any evidence of different encodings actually occurring.
Unfortunately, the Dewesoft library sometimes appends junk characters to the end of strings which cause utf-8 decoding errors in python and fail the tests. If we change the default to utf-8 then we need to either ask Dewesoft fix their library or have python ignore these decoding errors.
That sounds annoying. I would guess that the junk characters are a result of the C lib interpreting parts of the memory as strings when it should not, i.e. string length mismatch at that level?
First off, thanks for the great Dewesoft reader library. I was recently using it for my datafiles which are DXD and are created on a Windows x64, en-US machine.
The units had some unicode characters for degree symbol and ohms. When I imported it with this library it had the classic Å symbol which is the give away of reading UTF-8 binary data but assuming it should be decoded according to Windows codepage (looks like you have ISO-8859-1 chosen).
A quick peek into the python code and I saw this is extremely easy to fix in this library - just call
dwdatareader.encoding = 'utf-8'
and it gives the correctly decoded strings.I just wanted to file an issue to bring up the fact that it appears that DewesoftX is encoding strings in UTF-8 and perhaps this library should change the default encoding to match?
Unfortunately I am only sample size of one and have not tested other locales or versions of Dewesoft, so I am not sure if this default encoding applies everywhere. Thanks for your time!