aws / aws-iot-device-sdk-python-v2

Next generation AWS IoT Client SDK for Python using the AWS Common Runtime
Apache License 2.0
408 stars 213 forks source link

RuntimeError: 1033 (AWS_IO_TLS_CTX_ERROR): Failed to create tls context when trying to call mqtt_connection_builder.mtls_from_path #78

Closed Th3G4mbl3r closed 2 years ago

Th3G4mbl3r commented 4 years ago

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug I am writing a lambda function using python 3.7 runtime to update a device shadow when a command is received. However, i get the AWS_IO_TLS_CTX_ERROR when i try to make this call: mqtt_connection = mqtt_connection_builder.mtls_from_path( endpoint=endpoint, cert_filepath=device_cert, pri_key_filepath=device_key, client_bootstrap=client_bootstrap, ca_filepath=amazon_ca_cert, client_id=valve_name, on_connection_interrupted=on_connection_interrupted, on_connection_resumed=on_connection_resumed, clean_session=False, keep_alive_secs=6) .

I've printed the values of the variables and i am sure they are all correct for the certificates and the certificates are actually available in my deployment package.

The exact same code when i run locally on my laptop as a proper python program seems to work just fine.

SDK version number - 1.20.0 for awsiotsdk and 0.15.5 for awscrt

Platform/OS/Device What are you running the sdk on? AWS Lambda, Python 3.7.

To Reproduce (observed behavior) Steps to reproduce the behavior (please share code)

The start of the function definition upto the point that it actually errors out:

def setup_device_connection(endpoint, device_cert, device_key, amazon_ca_cert, valve_name): logger.debug("Inside setup_device_connection...")

event_loop_group = io.EventLoopGroup(1)
host_resolver = io.DefaultHostResolver(event_loop_group)
client_bootstrap = io.ClientBootstrap(event_loop_group, host_resolver)

logger.debug("Going to setup mqtt connection object...")

mqtt_connection = mqtt_connection_builder.mtls_from_path(
    endpoint=endpoint,
    cert_filepath=device_cert,
    pri_key_filepath=device_key,
    client_bootstrap=client_bootstrap,
    ca_filepath=amazon_ca_cert,
    client_id=valve_name,
    on_connection_interrupted=on_connection_interrupted,
    on_connection_resumed=on_connection_resumed,
    clean_session=False,
    keep_alive_secs=6)

Expected behavior

Connection to get established to IOT endpoint after mqtt_connection object is instantiated using above call.

Logs/output

Below is a sample of the debug messages i am printing:

[DEBUG] 2020-06-16T12:56:07.421Z 02ab969e-f557-4063-8e0c-e625b9ccf19d valve name = pressure_valve_1

[DEBUG] 2020-06-16T12:56:07.421Z 02ab969e-f557-4063-8e0c-e625b9ccf19d device cert = certs/pressure_valve_1_ca_cert.crt

[DEBUG] 2020-06-16T12:56:07.421Z 02ab969e-f557-4063-8e0c-e625b9ccf19d device key = certs/pressure_valve_1.key

[DEBUG] 2020-06-16T12:56:07.421Z 02ab969e-f557-4063-8e0c-e625b9ccf19d endpoint = akom112at41zv-ats.iot.us-west-2.amazonaws.com

[DEBUG] 2020-06-16T12:56:07.421Z 02ab969e-f557-4063-8e0c-e625b9ccf19d amazon root CA cert = certs/AmazonRootCA1.pem

[DEBUG] 2020-06-16T12:56:07.421Z 02ab969e-f557-4063-8e0c-e625b9ccf19d Inside setup_device_connection...

[DEBUG] 2020-06-16T12:56:07.422Z 02ab969e-f557-4063-8e0c-e625b9ccf19d Going to setup mqtt connection object...

[ERROR] RuntimeError: 1033 (AWS_IO_TLS_CTX_ERROR): Failed to create tls context.

The full stack trace for the error is as follows: Traceback (most recent call last):   File "/var/task/lambda_function.py", line 359, in lambda_handler     return dispatch(event)   File "/var/task/lambda_function.py", line 343, in dispatch     return activate_valve_instance(intent_request)   File "/var/task/lambda_function.py", line 313, in activate_valve_instance     setup_device_connection(endpoint, device_cert, device_key, amazon_ca_cert, valve_name)   File "/var/task/lambda_function.py", line 190, in setup_device_connection     keep_alive_secs=6)   File "/var/task/awscrt/awsiot_mqtt_connection_builder.py", line 211, in mtls_from_path     return _builder(tls_ctx_options, **kwargs)   File "/var/task/awscrt/awsiot_mqtt_connection_builder.py", line 172, in _builder     tls_ctx = awscrt.io.ClientTlsContext(tls_ctx_options)   File "/var/task/awscrt/io.py", line 275, in init     options.verify_peer

bretambrose commented 4 years ago

Can you try reading the two files (with error checking) and then using "mtls_from_bytes"? The first possibility that comes to mind is that raw relative-path file io may not necessarily work out of the box when running a Lambda. See https://stackoverflow.com/questions/41063214/reading-a-packaged-file-in-aws-lambda-package for more info (although nodejs-specific).

Assuming that it's not a file IO issue, can you gather trace level logs and attach them (scrub for sensitive info first):

io.init_logging(awscrt.io.LogLevel.Trace, 'stdout')

should map crt logging into your lambda log.

Th3G4mbl3r commented 4 years ago

hi @bretambrose

I added the trace level log statement to the handler function in my lambda but i do not see any additional trace statements being generated in cloudwatch logs... Am i looking in the wrong place or something?


def lambda_handler(event, context):

    #Route the incoming request based on intent.
    #The JSON body of the request is provided in the event slot.

    logger.debug('event.bot.name={}'.format(event['bot']['name']))

    logger.debug("Enabling AWS IOT SDK Logging level at Trace")
    io.init_logging(io.LogLevel.Trace, 'stdout')

    return dispatch(event)

I will try your alternate suggestion of reading the files myself and then using the mtls_from_bytes option and come back with results from that as well.

Th3G4mbl3r commented 4 years ago

Can you try reading the two files (with error checking) and then using "mtls_from_bytes"? The first possibility that comes to mind is that raw relative-path file io may not necessarily work out of the box when running a Lambda. See https://stackoverflow.com/questions/41063214/reading-a-packaged-file-in-aws-lambda-package for more info (although nodejs-specific).

Assuming that it's not a file IO issue, can you gather trace level logs and attach them (scrub for sensitive info first):

io.init_logging(awscrt.io.LogLevel.Trace, 'stdout')

should map crt logging into your lambda log.

Also, looking through the stack trace and the actual codebase for the SDK, the error actually occurs after the cert files have actually been successfully. The cert files are actually being read in at line 210 of awsiot_mqtt_connection_builder. mtls_from_path. And then it invokes the _builder function with the new context options object that was output from reading the files which is then used to generate a ClientTlsContext object in line 172. This eventually triggers the error that i am getting from the constructor of this class when it tries to intialize this object. So at the bare minimum, i would think a more meaningful error message should be provided at least.

But as i said, let me try the option to read the files myself and pass them as byte arrays into mtls_from_bytes as per your recommendation as well.

Th3G4mbl3r commented 4 years ago

Can you try reading the two files (with error checking) and then using "mtls_from_bytes"? The first possibility that comes to mind is that raw relative-path file io may not necessarily work out of the box when running a Lambda. See https://stackoverflow.com/questions/41063214/reading-a-packaged-file-in-aws-lambda-package for more info (although nodejs-specific).

Assuming that it's not a file IO issue, can you gather trace level logs and attach them (scrub for sensitive info first):

io.init_logging(awscrt.io.LogLevel.Trace, 'stdout')

should map crt logging into your lambda log.

I tried the option for reading the certs myself and using the mtls_from_bytes mechanism. But no dice as well. Same error at same location. Just that this time it is triggered from mtls_from_bytes instead of mtls_from_path.

`[DEBUG]    2020-06-16T17:18:16.190Z    0febe34b-740d-449d-97b5-2171b42bf7d4    valve_name = pressure_valve_1

[DEBUG] 2020-06-16T17:18:16.190Z    0febe34b-740d-449d-97b5-2171b42bf7d4    Reading amazon root CA certificate...

[DEBUG] 2020-06-16T17:18:16.190Z    0febe34b-740d-449d-97b5-2171b42bf7d4    Reading device certificate...

[DEBUG] 2020-06-16T17:18:16.191Z    0febe34b-740d-449d-97b5-2171b42bf7d4    Reading device private key...

[ERROR] RuntimeError: 1033 (AWS_IO_TLS_CTX_ERROR): Failed to create tls context
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 375, in lambda_handler
    return dispatch(event)
  File "/var/task/lambda_function.py", line 356, in dispatch
    return activate_valve_instance(intent_request)
  File "/var/task/lambda_function.py", line 326, in activate_valve_instance
    setup_device_connection(endpoint, device_cert, device_key, amazon_ca_cert, valve_name)
  File "/var/task/lambda_function.py", line 206, in setup_device_connection
    keep_alive_secs=6)
  File "/var/task/awscrt/awsiot_mqtt_connection_builder.py", line 228, in mtls_from_bytes
    return _builder(tls_ctx_options, **kwargs)
  File "/var/task/awscrt/awsiot_mqtt_connection_builder.py", line 172, in _builder
    tls_ctx = awscrt.io.ClientTlsContext(tls_ctx_options)
  File "/var/task/awscrt/io.py", line 275, in __init__
    options.verify_peer`

Code for reading the files and the actual call to mtls_from_bytes as below:


def read_certificate_files(filepath):
    with open(filepath, mode='rb') as fh:
        contents = fh.read()
    return contents

logger.debug("Reading amazon root CA certificate...")
    amazon_root_cert = read_certificate_files(amazon_ca_cert)
    logger.debug("Reading device certificate...")
    device_cert_content = read_certificate_files(device_cert)
    logger.debug("Reading device private key...")
    device_key_content = read_certificate_files(device_key)

    mqtt_connection = mqtt_connection_builder.mtls_from_bytes(
        cert_bytes=device_cert_content,
        pri_key_bytes=device_key_content,
        endpoint=endpoint,
        client_bootstrap=client_bootstrap,
        ca_bytes=amazon_root_cert,
        client_id=valve_name,
        on_connection_interrupted=on_connection_interrupted,
        on_connection_resumed=on_connection_resumed,
        clean_session=False,
        keep_alive_secs=6)
Th3G4mbl3r commented 4 years ago

Hi Bret,

Any updates here? I've now tried both methods in my lambda - mtls_from_path as well as mtls_from_bytes and both mechanisms result in exactly the same exception. Is there anything else i can do progress the triage process further?

thanks & regards, rohit

Th3G4mbl3r commented 4 years ago

Can we at least reassign the labels as bug and needs triage as i've tried all the recommended options with the same result? And so this should be treated as a bug.

JonathanHenson commented 4 years ago

Out of curiosity, what format is your cert/key in: PEM pkcs#7 etc....?

JonathanHenson commented 4 years ago

Also I have to ask, since TLS is the only significant difference across platforms, which platform is your laptop?

Th3G4mbl3r commented 4 years ago

Out of curiosity, what format is your cert/key in: PEM pkcs#7 etc....?

Yes, PKCS #7 based PEMs generated using OpenSSL.

Th3G4mbl3r commented 4 years ago

Also I have to ask, since TLS is the only significant difference across platforms, which platform is your laptop?

MacOS catalina 10.15.5, python 3.8 and LibreSSL 2.8.3 (output of openssl version)

JonathanHenson commented 4 years ago

We use s2n on linux and native security framework on Apple. My suspicion is that s2n doesn’t like the certs you generated. Any chance you have access to a linux machine you could give it a go on so we can rule out lamba eccentricities?

Th3G4mbl3r commented 4 years ago

We use s2n on linux and native security framework on Apple. My suspicion is that s2n doesn’t like the certs you generated. Any chance you have access to a linux machine you could give it a go on so we can rule out lamba eccentricities?

I think you are onto something. I need to eventually run this code on a Raspberry Pi. And for now i hit the same issue there as well. So i will test a little more and come back to you, if the issue is with the certificate file format and what needs to change with the certificate or the certificate processing code to help address it.

Th3G4mbl3r commented 4 years ago

We use s2n on linux and native security framework on Apple. My suspicion is that s2n doesn’t like the certs you generated. Any chance you have access to a linux machine you could give it a go on so we can rule out lamba eccentricities?

I think you are onto something. I need to eventually run this code on a Raspberry Pi. And for now i hit the same issue there as well. So i will test a little more and come back to you, if the issue is with the certificate file format and what needs to change with the certificate or the certificate processing code to help address it.

based on my testing using my raspberry PI, i can confirm that there was a issue with the way the certificate was being process. My certificate file looks like the following:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 7 (0x7)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, ST=Oregon, O=OcaTank Pte Ltd, OU=pipeline-mgmt, CN=rootCA/emailAddress=rohitus@amazon.com
        Validity
            Not Before: Jun  9 10:21:43 2020 GMT
            Not After : Jun  9 10:21:43 2021 GMT
        Subject: C=SG, ST=Singapore, O=OcaTank Pte Ltd, OU=pipeline-mgmt, CN=pressure_sensor_1/emailAddress=rohitus@amazon.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:c4:....
               Exponent: 1234 (0x1234)
        X509v3 extensions:
            X509v3 Basic Constraints: 
                CA:FALSE
            Netscape Comment: 
                OpenSSL Generated Certificate
            X509v3 Subject Key Identifier: 
                8C:C1:....
            X509v3 Authority Key Identifier: 
                keyid:58:B5:21:...
    Signature Algorithm: sha256WithRSAEncryption
         62:b7:15:6b:.....
-----BEGIN CERTIFICATE-----
MIIEDjCCAvagAwIBAgIBBzANBgkqhkiG9w0BAQsFADCBhDELMAkGA1UEBhMCVVMx
DzANBgNVBAgMBk9yZWdvbjEYMBYGA1UECgwPT2NhVGFuayBQdGUgTHRkMRYw
.............................................
-----END CERTIFICATE-----

And this works perfectly on the mac. However on the raspberry, it gave the same error as that on the lambda. And i had to modify the file to remove all the starting content and only include the following portions:


-----BEGIN CERTIFICATE-----
Device certificate value.............................................
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
Root CA certificate value.............................................
-----END CERTIFICATE-----

Is there a reason that the s2n tool cannot handle the full file? If that is how it is meant to be, then there should at least be clear documentation to ensure that the certificate file needs to only have the above content.

JonathanHenson commented 4 years ago

Backstory. PEM is a unix friendly format. As a result we had to manually decode PEM files on Apple and windows to perform the import to the Secure Enclaves on those devices (keychain and cert manager respectively). For s2n, it has its own PEM parser: most likely a less permissive one than the one we wrote. That’s the cause of the discrepancy. I don’t think making a parser more permissive will fly with security, so we’ll probably tighten down the apple and windows parsers and add additional documentation

Th3G4mbl3r commented 4 years ago

Backstory. PEM is a unix friendly format. As a result we had to manually decode PEM files on Apple and windows to perform the import to the Secure Enclaves on those devices (keychain and cert manager respectively). For s2n, it has its own PEM parser: most likely a less permissive one than the one we wrote. That’s the cause of the discrepancy. I don’t think making a parser more permissive will fly with security, so we’ll probably tighten down the apple and windows parsers and add additional documentation

that sounds like a perfectly reasonable approach. In the meantime this github issue hopefully can provide guidance to people if they run into this issue. should this issue be left open until you work out a timeline for the changes in documentation at least?

SoraDevin commented 4 years ago

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug I am writing a lambda function using python 3.7 runtime to update a device shadow when a command is received. However, i get the AWS_IO_TLS_CTX_ERROR when i try to make this call: mqtt_connection = mqtt_connection_builder.mtls_from_path( endpoint=endpoint, cert_filepath=device_cert, pri_key_filepath=device_key, client_bootstrap=client_bootstrap, ca_filepath=amazon_ca_cert, client_id=valve_name, on_connection_interrupted=on_connection_interrupted, on_connection_resumed=on_connection_resumed, clean_session=False, keep_alive_secs=6) .

I've printed the values of the variables and i am sure they are all correct for the certificates and the certificates are actually available in my deployment package.

The exact same code when i run locally on my laptop as a proper python program seems to work just fine.

SDK version number - 1.20.0 for awsiotsdk and 0.15.5 for awscrt

Platform/OS/Device What are you running the sdk on? AWS Lambda, Python 3.7.

Hi @Th3G4mbl3r , a little off topic but I'm having a lot of trouble deploying this library to my python 3.7 lambda. I was hoping you could tell me what you did? I've pip installed from an amazon linux ec2 instance and put those libraries in the top level of my deployment package but am still getting errors when trying to import (i.e. "module not found 'awscrt'" or "_awscrt") even though that package is included.

Th3G4mbl3r commented 4 years ago

how are you bundling up the dependencies into lambda? I generally use AWS SAM for creating and managing lambda functions from the command line and let it handle the upload file generation process. But as long as you are uploading all the needed dependency modules correctly into lambda, it should work. maybe if you can share a screenshot of the directory structure of your files that you are uploading to lambda, it may help.

SoraDevin commented 4 years ago

I'm downloading the dependencies from my ec2 instance, adding my lambda function file/s, then zipping it all up and uploading it through the lambda web console as a deployment package. I've added an image of what that looks like without the function code:

image

I've tried various installation methods but essentially all I need is awsiot and shapely and anything those need to run.

Th3G4mbl3r commented 4 years ago

This is what my zip file structure looks like:

image

The awsiot library, the certs directory has my certificates and then code is in lambda_function.py.

and this is how i am importing it:

from awscrt import io, mqtt
from awsiot import mqtt_connection_builder
from awsiot import iotshadow
SoraDevin commented 3 years ago

Thanks for the response. After going down a big rabbit hole I ran into all sorts of issues because I couldn't easily install or compile the library on an ARM64 architecture where I was running my lambda package on (no option to use something else for what I'm trying to do unfortunately). I ended up abandoning my efforts for now and using the older v1 sdk. Cheers for helping walk me through it though.

Th3G4mbl3r commented 3 years ago

Thanks for the response. After going down a big rabbit hole I ran into all sorts of issues because I couldn't easily install or compile the library on an ARM64 architecture where I was running my lambda package on (no option to use something else for what I'm trying to do unfortunately). I ended up abandoning my efforts for now and using the older v1 sdk. Cheers for helping walk me through it though.

you're welcome. Sorry to hear you had to move back to v1 SDK. But as long as it is meeting your needs, go for it...

isaurabhpawar commented 3 years ago

Thanks for the response. After going down a big rabbit hole I ran into all sorts of issues because I couldn't easily install or compile the library on an ARM64 architecture where I was running my lambda package on (no option to use something else for what I'm trying to do unfortunately). I ended up abandoning my efforts for now and using the older v1 sdk. Cheers for helping walk me through it though.

you're welcome. Sorry to hear you had to move back to v1 SDK. But as long as it is meeting your needs, go for it...

ThankYou!! Reverting to V1.1.0 worked for me :)

SoraDevin commented 3 years ago

@isaurabhpawar no problem, glad my troubles helped someone.

AGiantSquid commented 3 years ago

Hey, I ran into this problem today. I think there must be a race condition, because I can reliably create the issues and "fix" the issue in the demo code just by adding log statements. See here: https://github.com/aws/aws-iot-device-sdk-python-v2/issues/188#issuecomment-834947955

alanbchristie commented 3 years ago

Also encountering this problem (on Raspberry Pi + Docker). My previous code (working) was using the v1 SDK so I've been forced to revert to using that release for now.

But I'd be keen to understand, as soon as the problem's fixed, what you have to do to get a v1 app running using v2 without encountering this error.

Basic environment: -

And basic connect-login...

            _LOGGER.info('Creating mqtt_connection...')

            self.event_loop_group: io.EventLoopGroup = io.EventLoopGroup(1)
            self.host_resolver: io.DefaultHostResolver =\
                io.DefaultHostResolver(self.event_loop_group)
            self.client_bootstrap: io.ClientBootstrap =\
                io.ClientBootstrap(self.event_loop_group,
                                   self.host_resolver)

            # An AWS IoT MQTT Client
            mqtt_connection: awscrt.mqtt.Connection =\
                mqtt_connection_builder.mtls_from_path(
                    endpoint=AWS_IOT_MQTT_ENDPOINT,
                    cert_filepath=AWS_IOT_MQTT_CERT_DIR + '/ca.crt',
                    pri_key_filepath=AWS_IOT_MQTT_CERT_DIR + '/privkey.pem',
                    ca_filepath=AWS_IOT_MQTT_CERT_DIR + '/crt.crt',
                    client_bootstrap=self.client_bootstrap,
                    client_id=AWS_IOT_MQTT_CLIENT_ID,
                    clean_session=False,
                    keep_alive_secs=_MQTT_KEEP_ALIVE_S)
AndrewGotz commented 2 years ago

Hey so are we really being told for this specific issue to still move back to v1 api? Is there any ETA on a fix for this?

I am using the java SDK and hitting this same issue using v2 on a linux box -

Caused by: software.amazon.awssdk.crt.CrtRuntimeException: TlsContext.tls_ctx_new: Failed to create new aws_tls_ctx (aws_last_error: AWS_IO_TLS_CTX_ERROR(1033), Failed to create tls context)

cc @TwistedTwigleg

jmklix commented 2 years ago

This original problem was solved, but AWS_IO_TLS_CRX_ERROR can be caused by many different things. @AndresGotz or anyone else who comes across this error please open a new issue in the respective sdk and include all information about your current error.

Th3G4mbl3r commented 2 years ago

I am closing the issue as the original reason i got this issue was with certificate file parsing which as explained in the thread was resolved.

github-actions[bot] commented 2 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.