bparzella / secsgem

Simple Python SECS/GEM implementation
GNU Lesser General Public License v2.1
177 stars 85 forks source link

Unicode Support #31

Open Frants1987 opened 3 years ago

Frants1987 commented 3 years ago

Hi, first of all thanks for your library. I use it quite often to develop test applications. My customer is from china and wants to send messages in Chinese to the equipment. I looked at your code and it seems you don't have support for unicode (Vx) data types (http://www.hume.com/html85/mann/TSN.html). So maybe this would be something to support for the future. Regards

Frants1987 commented 3 years ago

An alternative could be encoding the unicode to UTF-8 and send it over the A data type. I look into that and create a pull request if I find a solution.

Frants1987 commented 3 years ago

OK, why not use the UTF-8 encoding right away? I know that all characters in latin-1 above the value of 127 are represented differently in UTF-8 but it would be nice to support all unicode characters. My suggestion would be: Set the encoding in the class SecsVarString to utf-8.

bparzella commented 3 years ago

Hey, thanks for your input, it is well appreciated.

I looked into the SECS specification, and noticed the V (= 2 byte characters, format code 22) is only used in the TEXT data item which is in turn only used in the S10 streams for terminal services. That doesn't seem that flexible to me.

Using UTF-8 coding could affect existing implementations, that are already using values above 127, so that is not a change that is backwards compatible. What you should be able to do, is change the coding of SecsVarString for your implementation, as this is a class variable (see SecsVarString). Simply changing the coding by using SecsVarString.coding = "<insert coding here>" should globally change how the A type data is decoded to a python string. But this is a very rough hack, and I can't guarantee this works as expected if at all.

Another idea would be to encode the text in the transferred string, eg. by using base64 coding. By this way the data can be coded in readable ASCII characters and should be easily transferrable using SECS. All the coding is done before and after the SECS protocol.

Keep in mind, the last two methods require the equipment to support the same coding done in the python code.

Hope I could help you,

Regards Benjamin

Frants1987 commented 3 years ago

Thanks for your reply. SecsVarString.coding = "<insert coding here>" That's what I did now. However, it seems like a hack to me as well. How about handling the UnicodeEncodeError on each encode calls. You could have a universal fallback encoding like UTF-8. Also, in my case the MES specification allows UTF-8 encoding on the equipment and host side, so I will go that way.

bparzella commented 3 years ago

At the moment I have neither a current secs specification, nor any equipment or host software available to test on. Because of that, I am hesitant to implement changes on such a low level.