Added variables for encoding handling

orkhands commented 2 years ago

I experienced an issue when polling the program status using get_program_stack(), which would throw a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position xy: invalid continuation byte

Reason in my case is that the machine program running has a german "Umlaut" in its file name (e.g., TNC:\X\Y\Z\xänderung.h', which does not seem to be encoded correctly. Since the error is thrown within the pyLSV2-lib, I can't just ignore the error and continue, since the rest of the queried information from get_program_stack() is then gone, too.

To solve this issue, one can either specify the desired encoding in the decode() method, or one sets the error handling of the same method to either ignore undecodable bytes or replacing them. It seems both approaches can be useful for the user, so I suggest to provide some parameters for the user to set both options.

While the library decodes many values from the machine, it seems, that only in get_program_stack() values are transmitted which originate from the user instead of the machine itself. Thus, I would expect that we do not see these issues in any other part of the code. Yet, I might be wrong here and having these options would be a good idea at other places as well.

Changes made in client.py:

Introduced variable in __init__ allowing a user to set the strategy for handling encoding errors. Default value corresponds to default value of encoding in decode() method (encoding = "utf-8")
Introduced variable in __init__ allowing a user to set the desired encoding to be used. Default value corresponds to default value of errors in decode() method (errors="strict")
Added both options to the decode-methods used in get_program_stack()

drunsinn commented 2 years ago

Have you tried replacing .decode("utf8") with .decode("latin1") instead of your fix? During the refactoring for version 1.0 I found a similar problem. After some research I determined that the controls probably use Latin-1 (ISO 8859-1) encoding. If latin1 works in your case it would make the fix easier as it wouldn't need the switch you introduced.

orkhands commented 2 years ago

@drunsinn For sure changing the encoding would solve my specific problem, I was just thinking of general usability of the library for other users as well. Other users experiencing the same issue might benefit from the possibility to set their own encoding and/or error handling. As well, it would be charming if I could still use the pip package instead of maintaining my own fork ;) Of course, one could also solve this issue differently by adding conf file or something of this sort (again increasing complexity, I understand...)

drunsinn commented 2 years ago

@orkhands the problem with the encoding is with the incoming data from the control. Your application should not need to select the encoding at all. Strings in Python3 are always utf8 if not specified otherwise.

As far as I understand the protocol now there are no two-byte characters, that's why I assumed lsv2 would be ascii. Instead it uses latin1 which makes sense since Heidenhain is a german company. Some tests also support this assumption since I could not get any other encodings to work reliably.

For 1.0 the encoding/decoding of strings has already been switched to latin1, for the current release I didn't want to back port these changes. Since you experience this problem now and waiting for 1.0 would take to long changing the encoding to latin1 is the easiest way.

orkhands commented 2 years ago

That's great news, thank you for your work and the 1.0 prospect :) I will proceed as proposed.

drunsinn commented 2 years ago

If you are interested you can take a look at the branch for 1.0: https://github.com/drunsinn/pyLSV2/tree/release_v1.0

Some adjustments to your code will be necessary but I feel they make the library a lot more consistent.

drunsinn / pyLSV2

Added variables for encoding handling #39