etingof / pyasn1

Generic ASN.1 library for Python
http://snmplabs.com/pyasn1
BSD 2-Clause "Simplified" License
242 stars 118 forks source link

python3 error: Short octet stream on tag decoding #185

Open williamcroberts opened 4 years ago

williamcroberts commented 4 years ago

I'm seeing an issue in cert decoding on python3 that I am not seeing on python2:

Traceback (most recent call last):
  File "./test.py", line 8, in <module>
    cert = decoder.decode(substrate, asn1Spec=rfc2459.Certificate())
  File "/home/wcrobert/.local/lib/python3.6/site-packages/pyasn1/codec/ber/decoder.py", line 1338, in __call__
    'Short octet stream on tag decoding'
pyasn1.error.SubstrateUnderrunError: Short octet stream on tag decoding

Which I can reproduce with this certificate:

-----BEGIN CERTIFICATE-----
MIIBETCBuAIJAJ0W0tvyDooPMAoGCCqGSM49BAMCMBExDzANBgNVBAMMBm15IGtl
eTAeFw0xOTExMjYxNzM0MTRaFw0yMDExMjUxNzM0MTRaMBExDzANBgNVBAMMBm15
IGtleTBZMBMGByqGSM49AgEGCCqGSM49AwEHA0IABOTFZ0YJOAb39qkUJYIQxqM8
TW3fMsokFnc4oR7221+ysTS6xBHkvLUB2Xh8OVZOsCIRsZMvSrpBh7TirjIqs2Iw
CgYIKoZIzj0EAwIDSAAwRQIgRPLeuw00u5+PJx+v531MThBhBtryeLAV7s6KoeTX
hpQCIQCyyy9swRJgzBB1Op9A5KJrwMWeFwW9w1L890ub7zkGMQ==
-----END CERTIFICATE-----
from pyasn1_modules import pem, rfc2459
from pyasn1.codec.der import decoder

substrate = pem.readPemFromFile(open("cert.pem", "rb"))
cert = decoder.decode(substrate, asn1Spec=rfc2459.Certificate())

OpenSSL seems to be fine with the cert:

openssl x509 -in cert.pem -text -noout
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            9d:16:d2:db:f2:0e:8a:0f
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: CN = my key
<snip>

As well as various SSL cert checker websites, like https://www.sslchecker.com/certdecoder

I've tried a few different versions of python3 (3.5.2 and 3.6.8) and can reproduce with both. Ive also tried different versions of pyasn1 and pyasn1_modules with no luck.

williamcroberts commented 4 years ago

In it's in the code that returns the substrate is returning an empty string.

def readPemBlocksFromFile(fileObj, *markers):
    startMarkers = dict(map(lambda x: (x[1], x[0]),
                            enumerate(map(lambda y: y[0], markers))))
    stopMarkers = dict(map(lambda x: (x[1], x[0]),
                           enumerate(map(lambda y: y[1], markers))))
    idx = -1
    substrate = ''
    certLines = []
    state = stSpam
    while True:
        certLine = fileObj.readline()

The line:

certLine = fileObj.readline()

never picks up the -----BEGIN CERTIFICATE----- scissor line because the mode flags on the file are 'rb' and readline() returns binary data. This seems very brittle, we probably want to ensure that the mode flags wouldn't cause these issues. Maybe call encode() or str() on the returned data from read? Not really a python guru, so not sure what the best fix for that would be.

etingof commented 4 years ago

Interesting! Thank you for troubleshooting this issue! Error message is misleading.

I will push a patch and report back.

williamcroberts commented 4 years ago

@etingof the other issue, is when even when we get past this the data returned later on when accessing the ASN1 sub fields of the cert is str, where we would probably want byte array.

I think a better fix would be ensuring that after we decode base64 we ensure that its a bytes and not str.... this py2to3 str/bytes stuff has been fun (not).