Treats packets as UTF-8 encoded strings instead of binary byte strings - corruption inevitable

hessu commented 7 years ago

The aprs module tries to do UTF-8 decoding and UTF-8 encoding, and process packets as Python unicode strings.

https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L97 https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L132 https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L154 https://github.com/ampledata/aprs/blob/master/aprs/classes.py#L201

However, a lot of APRS packets contain byte sequences which can not be successfully decoded as UTF-8. Using UTF-8 in some text fields of APRS packets is a relatively recent invention – older software transmits various other international single-byte character sets (ISO-8859-15 and such) which will totally fail when trying to decode as UTF-8. At least one app transmits UTF-16 and a lot of trackers (including popular Kenwood radio models) emit packets with NUL bytes and other binary oddities.

If UTF-8 decoding and subsequent encoding is done, and the packet is then retransmitted, some packets will be modified by the software (or very least, the packets will be dropped if the encoding or decoding fails with an exception). If this is done on an iGate, modified duplicate packets are generated. This is very unfortunate.

Longer story on the subject: https://github.com/hessu/aprsc/blob/master/doc/IGATE-HINTS.md#packets-getting-modified-due-to-character-encoding-issues

UTF-8 is only used in specific fields of APRS packet content: text message contents, position comment string, status string as such. The whole packet, and specifically a lot of corrupted and broken packets on the network, and packets emitted by older software, are not valid UTF-8, and igates need to treat them as byte arrays instead of unicode strings when igating, to prevent packets from getting corrupted and duplicated even more.

Hessu, OH7LZB, of aprs.fi & aprsc

ampledata commented 7 years ago

Thank you for pointing this out, I'm hoping that the python3_support branch alleviates this problem.

hessu commented 7 years ago

Are you considering to fix this? It's actually quite a bad problem, breaking a lot of packets; igates having this problem should preferably be turned off until the bug is fixed. Python can deal with binary data just fine, as long as the code does not explicitly try to decode and encode as unicode text.

ampledata commented 7 years ago

Yes, I believe I've already fixed this in the python3_support branch.

ampledata commented 7 years ago

Luckily the only igate that I'm aware of that uses this library is offline.

ampledata / aprs

Treats packets as UTF-8 encoded strings instead of binary byte strings - corruption inevitable #18