etingof / pyasn1

Generic ASN.1 library for Python
http://snmplabs.com/pyasn1
BSD 2-Clause "Simplified" License
244 stars 118 forks source link

Unable to decode non-ISO 646 values sent in IA5Strings #191

Open mikcox opened 4 years ago

mikcox commented 4 years ago

Hello!

First off, thank you SO MUCH for this package. It's been immensely useful!

I happened to notice an oddity when trying to decode messages from a device that's sending me payloads that include non-ISO 646 characters in fields that are supposed to be IA5Strings. Namely, I'm trying to decode the following payload:

b'0U\x7fNE0C\x02\x01{\x16\x0476PK\x04\x06(\x11\xa5\xdb\xc4\xee\x01\x01\x00\x02\x01\xff\n\x01\x01\x04\x03\x00\x00\x00\xa0\x1d\x16\x1bLE-Mark\xe2\x80\x99s Bose Headphones\t\x03\xc0\x02\x110\x0b\x02\x04^k\xec\xc4\x02\x03\x0b1#'

with an ASN.1 spec that's something like:

class MacAddress(univ.OctetString):
    pass

class BluetoothDetect(univ.Sequence):
    pass

BluetoothDetect.tagSet = univ.Sequence.tagSet.tagExplicitly(tag.Tag(tag.tagClassApplication, tag.tagFormatConstructed, 78))
BluetoothDetect.componentType = namedtype.NamedTypes(
    namedtype.NamedType('scanID', univ.Integer()),
    namedtype.NamedType('sensorID', char.IA5String()),
    namedtype.NamedType('macAddress', MacAddress()),
    namedtype.NamedType('reservedLAP', univ.Boolean()),
    namedtype.OptionalNamedType('btClassicChannel', univ.Integer()),
    namedtype.NamedType('detectType', univ.Enumerated(namedValues=namedval.NamedValues(('btClassic', 0), ('ble', 1), ('btClassicPassive', 2)))),
    namedtype.NamedType('btClassicHeader', univ.OctetString()),
    namedtype.OptionalNamedType('deviceName', char.IA5String().subtype(explicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 0))),
    namedtype.OptionalNamedType('manufacturerName', char.IA5String().subtype(explicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 1))),
    namedtype.NamedType('rssi', univ.Real())
)

After investigating the payload, I have a hunch that the issue is related to that pesky non-ISO 646 "fancy single quote" / backtick-like character in the deviceName field (represented in the payload as \xe2\x80\x99). When I remove that character, it decodes using the above ASN.1 spec without a problem.

Is there any way we can get all decoders in this package to include some sort of character replacement like the errors option in the built-in python str.decode('utf-8', errors='replace')? It's annoying to lose an entire payload when we hit a character like this.

Alternatively, do you have any suggestions for how to do my own replacement on any non-IA5String characters in the payload before I send it to the decoder?

Thanks in advance!

mikcox commented 4 years ago

I did another loop back at this problem recently and wanted to post an update on what I've learned:

The character in question was a "Right Single Quotation Mark" (ASCII decimal 146). I've confirmed that the device that I'm pulling from is NOT actually technically to specification, since it's sending ASCII values above 128 in IA5Strings.

That said, I don't have control over that device, and it'd be awesome if we could add some simple error catching like this in the IA5String parser.

In the meantime, I'm pre-sanitizing my payloads of a few of the common ASCII characters above 128 that people might use:

# Given a string of bytes, replace a handful of common ASCII values above 128 with similar characters
# that have ASCII values below 128
def replace_non_iso(data: bytes) -> bytes:
    patterns = [
        [b'\xe2\x80\x99', b"'\x00\x00"],
        [b'\xe2\x80\x9c', b'"\x00\x00'],
        [b'\xe2\x80\x9d', b'"\x00\x00'],
        [b'\xe2\x80\x9e', b'"\x00\x00'],
        [b'\xe2\x80\x9f', b'"\x00\x00'],
        [b'\xc3\xa9', b'e\x00'],
        [b'\xe2\x80\x9c', b'"\x00\x00'],
        [b'\xe2\x80\x93', b'-\x00\x00'],
        [b'\xe2\x80\x92', b'-\x00\x00'],
        [b'\xe2\x80\x94', b'-\x00\x00'],
        [b'\xe2\x80\x94', b'-\x00\x00'],
        [b'\xe2\x80\x98', b"'\x00\x00"],
        [b'\xe2\x80\x9b', b"'\x00\x00"],
        [b'\xe2\x80\x90', b'-\x00\x00'],
        [b'\xe2\x80\x91', b'-\x00\x00'],
        [b'\xe2\x80\xb2', b"'\x00\x00"],
        [b'\xe2\x80\xb3', b"'\x00\x00"],
        [b'\xe2\x80\xb4', b"'\x00\x00"],
        [b'\xe2\x80\xb5', b"'\x00\x00"],
        [b'\xe2\x80\xb6', b"'\x00\x00"],
        [b'\xe2\x80\xb7', b"'\x00\x00"],
        [b'\xe2\x81\xba', b"+\x00\x00"],
        [b'\xe2\x81\xbb', b"-\x00\x00"],
        [b'\xe2\x81\xbc', b"=\x00\x00"],
        [b'\xe2\x81\xbd', b"(\x00\x00"],
        [b'\xe2\x81\xbe', b")\x00\x00"]
    ]
    for pattern in patterns:
        data = data.replace(pattern[0], pattern[1])

    return data