Closed hessu closed 7 years ago
Thank you for pointing this out, I'm hoping that the python3_support branch alleviates this problem.
Are you considering to fix this? It's actually quite a bad problem, breaking a lot of packets; igates having this problem should preferably be turned off until the bug is fixed. Python can deal with binary data just fine, as long as the code does not explicitly try to decode and encode as unicode text.
Yes, I believe I've already fixed this in the python3_support branch.
Luckily the only igate that I'm aware of that uses this library is offline.
The aprs module tries to do UTF-8 decoding and UTF-8 encoding, and process packets as Python unicode strings.
https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L97 https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L132 https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L154 https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L201
However, a lot of APRS packets contain byte sequences which can not be successfully decoded as UTF-8. Using UTF-8 in some text fields of APRS packets is a relatively recent invention – older software transmits various other international single-byte character sets (ISO-8859-15 and such) which will totally fail when trying to decode as UTF-8. At least one app transmits UTF-16 and a lot of trackers (including popular Kenwood radio models) emit packets with NUL bytes and other binary oddities.
If UTF-8 decoding and subsequent encoding is done, and the packet is then retransmitted, some packets will be modified by the software (or very least, the packets will be dropped if the encoding or decoding fails with an exception). If this is done on an iGate, modified duplicate packets are generated. This is very unfortunate.
Longer story on the subject: https://github.com/hessu/aprsc/blob/master/doc/IGATE-HINTS.md#packets-getting-modified-due-to-character-encoding-issues
UTF-8 is only used in specific fields of APRS packet content: text message contents, position comment string, status string as such. The whole packet, and specifically a lot of corrupted and broken packets on the network, and packets emitted by older software, are not valid UTF-8, and igates need to treat them as byte arrays instead of unicode strings when igating, to prevent packets from getting corrupted and duplicated even more.