Chinese garbled code - Githubissues

qaqRose commented 7 months ago

I tried to get the description in deviceObject, but the output Chinese garbled code, how should I deal with its coding?

qaqRose commented 7 months ago

This does not seem to be a programming problem, which is repeated on another tool, yabe

splatch commented 7 months ago

Hello @qaqRose, it doesn’t have to be like that. I believe that we can introduce/configure text encoding in library to avoid this issue.

splatch commented 7 months ago

It boils down to this code I suppose: MangoAutomation/BACnet4J#55, note - it can be addressed above bacnet4j by post-re-processing CharacterString. Can you please provide me a byte representation of string you tested with and Chinese characters it should draw? Wireshark capture should contain string and an extra tag of text encoding.

qaqRose commented 7 months ago

Hello @splatch , Thank you for your reply. I have provided some Chinese test characters and Wireshark message files, which mainly contain the connection of bacnet client, the acquisition of devices and the acquisition of BacNetObject information.

Tips: all the fields with garbled codes in Chinese are accountion.

example:

  analog-ouput,15   3号主机冷却阀控制模式
  analog-ouput,16   3号主机冷冻阀控制模式
  analog-ouput,17   3号主机控制模式

bacnet.zip

qaqRose commented 7 months ago

Hello @splatch ， It seems to be caused by not supporting MBCS. When I switched to UTF-8, I didn't have this problem.

splatch commented 7 months ago

I've made basic test with byte representation of your text and it doesn't match. The device side encoding is probably wrong or not in line with UTF-8. At least for description of analog output 16 (packet 555 in your capture/screenshot).

Wireshark input: [75,16,0,33,ba,c5,d6,f7,bb,fa,c0,e4,b6,b3,b7,a7,bf,d8,d6,c6,c4,a3,ca,bd] Test code:

CharacterString str = new CharacterString(Encodings.ANSI_X3_4, "3号主机冷冻阀控制模式");
ByteQueue bq = new ByteQueue();
bq.push(new byte[] {0x75, 0x16}); // header
str.writeImpl(bq);
System.out.println(new CharacterString(bq)); // prints 3号主机冷冻阀�

The output is not 1:1 with your input, but what worries me, and leads to assumption that device side might be wrong, is byte representation of string, at least its beginning: [75,16,0,33,e5,8f,b7,e4,b8,bb,e6,9c,ba,e5,86,b7,e5,86,bb,e9,98,80,e6,8e,a7,e5,88,b6,e6,a8,a1,e5,bc,8f]. Header 0x75, 0x16 is same, then 0x00 which indicate encoding (utf8) and 0x33 which stands for 3 in UTF-8. What goes after that is of the track.

Other devices/objects in your dump look fine. It doesn't seem to be library issue, especially that text encoding declared by device is UTF-8.

qaqRose commented 7 months ago

Yes, the coding on the device side is not UTF-8. I don't know why wireshare is displayed as UTF-8. This is the message that I set up the device side to use UTF-8.

       CharacterString str = new CharacterString(CharacterString.Encodings.ANSI_X3_4, "3号主机冷冻阀控制模式");
        ByteQueue bq = new ByteQueue();
        bq.push(new byte[] {0x75, 0x20}); // header
        str.writeImpl(bq);
        System.out.println(new CharacterString(bq)); // prints 3号主机冷冻阀控制模式

and when i change the 0x16 the length value type 0x20

It is correct to print the text

qaqRose commented 7 months ago

utf-8

utf8_bacnet.zip

Code-House / bacnet4j-wrapper

Chinese garbled code #39