aws / aws-iot-device-sdk-python-v2

Next generation AWS IoT Client SDK for Python using the AWS Common Runtime
Apache License 2.0
408 stars 213 forks source link

pubsub sample unexpected hangup #129

Closed THOM-AwS closed 3 years ago

THOM-AwS commented 4 years ago

Im trying to work through the sample code but I keep getting stuck at trying to run the script, and it coming back with errors.

python pubsub.py --endpoint <REDACTED>-ats.iot.ap-southeast-2.amazonaws.com --cert ../../certs/7894bfe332-certificate.pem.crt --key ../../certs/7894bfe332-private.pem.key --root-ca ../../certs/AmazonRootCA1.pem --topic <TOPIC> --verbosity Info --client-id <ID>

[INFO ] [2020-10-28T12:53:33Z] [b6fc8010] [event-loop] - id=0x1b94c60: Initializing edge-triggered epoll
[INFO ] [2020-10-28T12:53:33Z] [b6fc8010] [event-loop] - id=0x1b94c60: Using eventfd for cross-thread notifications.
[INFO ] [2020-10-28T12:53:33Z] [b6fc8010] [event-loop] - id=0x1b94c60: Starting event-loop thread.
[INFO ] [2020-10-28T12:53:33Z] [b6fc8010] [dns] - id=0x1ac0f70: Initializing default host resolver with 16 max host entries.
[INFO ] [2020-10-28T12:53:33Z] [b6fc8010] [channel-bootstrap] - id=0x1ac11e0: Initializing client bootstrap with event-loop group 0x1bafa28
[INFO ] [2020-10-28T12:53:33Z] [b5754460] [event-loop] - id=0x1b94c60: main loop started
[INFO ] [2020-10-28T12:53:33Z] [b5754460] [event-loop] - id=0x1b94c60: default timeout 100000, and max events to process per tick 100
Connecting to <REDACTED>-ats.iot.ap-southeast-2.amazonaws.com with client ID '<THING>'...
[INFO ] [2020-10-28T12:53:35Z] [b6fc8010] [mqtt-client] - id=0x1c53a00: using ping timeout of 3000000000 ns
[WARN ] [2020-10-28T12:53:35Z] [b5754460] [socket] - id=0xb4c04860 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer.
[ERROR] [2020-10-28T12:53:35Z] [b5754460] [socket] - id=0xb4c04860 fd=6: connect failed with error code 101.
[INFO ] [2020-10-28T12:53:35Z] [b5754460] [dns] - id=0x1ac0f70: recording failure for record 2403:b300:ff00::dee:1a72 for <REDACTED>-ats.iot.ap-southeast-2.amazonaws.com, moving to bad list
[ERROR] [2020-10-28T12:53:35Z] [b5754460] [channel-bootstrap] - id=0x1ac11e0: failed to create socket with error 1049
[WARN ] [2020-10-28T12:53:35Z] [b5754460] [socket] - id=0xb4c04860 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer.
[INFO ] [2020-10-28T12:53:35Z] [b5754460] [socket] - id=0xb4c04860 fd=6: connection success
[INFO ] [2020-10-28T12:53:35Z] [b5754460] [mqtt-client] - id=0x1c53a00: sending disconnect message as part of graceful shutdown.
Traceback (most recent call last):
  File "pubsub.py", line 130, in <module>
    connect_future.result()
  File "/home/pi/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 462, in result
    return self.__get_result()
  File "/home/pi/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, in __get_result
    raise exception_type, self._exception, self._traceback
awscrt.exceptions.AwsCrtError: AwsCrtError(name='AWS_ERROR_MQTT_UNEXPECTED_HANGUP', message='The connection was closed unexpectedly.', code=5134)
[ERROR] [2020-10-28T12:53:36Z] [b6fc8010] [mqtt-client] - id=0x1c53a00: Connection is not open, and may not be closed

Even getting to this point was a total grind, as I had issues with the AWSCRT library working, some other dependencies that just would not go as I expect they would. I cannot get past this point now and I have looked at a lot of other pages around this theme.

rccarper commented 4 years ago

Does the policy attached to the certificate allow for the specified client-id to connect, and also to subscribe/publish to the topic you're specifying? The sample policy on this page should work with the default topic/client-id: https://github.com/aws/aws-iot-device-sdk-python-v2/tree/master/samples

denniszwiers commented 4 years ago

@HaterMonestary A grind it is. I am still stuck at the installation of AWSCRT library (fails on python 3.7.3 on Raspberry Pi OS). can you tell me how you fixed this?

THOM-AwS commented 4 years ago

@rccarper I have policies that have allow:, resource: to rule this out as an issue, does it require the arn specific account numbers and region in the ARN, etc? I am using the classic shadow at this point, is that a potential problem? Does this sample require the use of a custom shadow?

@denniszwiers I honestly could not tell you what the thing was that made it work, I installed a bunch of suggested libraries from other threads, and none of them worked, and I spent hours on this. then I came back the next day and somehow it worked. I will likely totally rebuild my pi soon because of all the extra libraries and apparent dependencies I installed.

I have no issues with any of the V1 AWSIOT Python SDK, and I had some good success with getting the project for the soil moisture sensor working, and that was cool. whatever complexity has been added to the V2 Python SDK just does not seem to be working for me. and this was a fresh Pi that I had setup prior to starting. Im trying to setup something with the full pubsub experience with the delta, but im not really much of a coder, so I cant pull this apart myself. Im just trying to work it out as I go from using code blocks that i understand and know to work. this does not work seemingly because of libraries that are out of sight, and I cant even begin to work out why.

THOM-AwS commented 4 years ago

I will say that the verbosity --verbosity debugwas pretty helpful to work out some more detailed info for what could be wrong, and I think installing the awscli was helpful as well to remove some errors I think. just seems like that being a dependency seems a little excessive.

jmklix commented 4 years ago

You can always start with a fully permissive policy if you haven't already. Just make sure to restrict it later to only the resources you need.

Resource": [
             "*"
           ]

Also make sure you are using Python 3.5+ and try downloading the latest version of the sdk (a patch was recently applied to fix compilation on arm devices). You shouldn't need to install awscli or anything else because all required dependencies should already be downloaded in the installation process. Please let us know if this doesn't work.

THOM-AwS commented 4 years ago

@jmklix My policy is set to * on both what it can do and what it can do it to. my thing has effective admin rights. Im using python 3.7.3. ive pulled the latest version down from git, still the same output. ive set the verbosity to Warn here, as the rest is the same as above.

pi@raspberrypi:~/aws-iot-device-sdk-python-v2/samples $ python3 pubsub.py --endpoint <REDACTED>-ats.iot.ap-southeast-2.amazonaws.com --cert ../../certs/7894bfe332-certificate.pem.crt --key ../../certs/7894bfe332-private.pem.key --root-ca ../../certs/AmazonRootCA1.pem --topic Alice --verbosity Warn --client-id Alice
Connecting to <REDACTED>-ats.iot.ap-southeast-2.amazonaws.com with client ID 'Alice'...
[WARN] [2020-10-30T12:52:11Z] [b58a2460] [socket] - id=0xb4f00698 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer.
[ERROR] [2020-10-30T12:52:11Z] [b58a2460] [socket] - id=0xb4f00698 fd=6: connect failed with error code 101.
[ERROR] [2020-10-30T12:52:11Z] [b58a2460] [channel-bootstrap] - id=0x771a38: failed to create socket with error 1049
[WARN] [2020-10-30T12:52:11Z] [b58a2460] [socket] - id=0xb4f00698 fd=6: setsockopt() for NO_SIGNAL failed with errno 92. If you are having SIGPIPE signals thrown, you may want to install a signal trap in your application layer.
Traceback (most recent call last):
  File "pubsub.py", line 130, in <module>
    connect_future.result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
awscrt.exceptions.AwsCrtError: AwsCrtError(name='AWS_ERROR_MQTT_UNEXPECTED_HANGUP', message='The connection was closed unexpectedly.', code=5134)
[ERROR] [2020-10-30T12:52:12Z] [b6ef78e0] [mqtt-client] - id=0x7deda0: Connection is not open, and may not be closed
pi@raspberrypi:~/aws-iot-device-sdk-python-v2/samples $
bretambrose commented 4 years ago

Ideally we need trace level logs (zipped and attached is fine). The connection failure you've included is the ipv6 connection attempt which is a red herring and will always fail.

bse-sja commented 4 years ago

log.txt Trace log attached.

I have the same issue. Old Python SDK works fine with exact same endpoint/certs. New SDK is broken. e.g: this works:

python3 aws-iot-device-sdk-python/samples/basicPubSub/basicPubSub.py -e b34vpm8o8k1yrw-ats.iot.us-east-1.amazonaws.com -r root-CA.crt -c T4.cert.pem -k T4.private.key

This does not:

python3 pubsub.py --endpoint b34vpm8o8k1yrw-ats.iot.us-east-1.amazonaws.com --cert T4.cert.pem --key T4.private.key --root-ca root-CA.crt

awscrt.exceptions.AwsCrtError: AwsCrtError(name='AWS_ERROR_MQTT_UNEXPECTED_HANGUP', message='The connection was closed unexpectedly.', code=5134)

THOM-AwS commented 4 years ago

@bretambrose how do you make it not use ipv6 sorry? I dont recall ever reading about that? stacktrace.txt

bretambrose commented 4 years ago

There is no way to disable it. Both connections are attempted and whichever succeeds (first) is kept; in many cases only the ipv4 connection succeeds anyways. It's an open backlog item to modify the logging so that the only time an ERROR level connection failure shows up is if both attempts fail.

THOM-AwS commented 4 years ago

@bretambrose ok, but why does the whole thing fail then, if ipv4 is succeeding? not only does it error like you say, it also totally fails out. is there a work around for this? should I just use the SDK V1 code?

bretambrose commented 4 years ago

With respect to v2 vs v1, the v1 sample uses a fixed client id unless overridden while the v2 sample uses a random client id unless overridden. If the permission policy is conditional on the client id, you will see differences in behavior.

Both sets of logs show the server hanging up the connection as a response to the mqtt connect packet, which is often caused by a permission problem. The disconnect cause is not transmitted to the client, so there's not much else we can do there. You can follow the instructions here (https://docs.aws.amazon.com/iot/latest/developerguide/configure-logging.html) to enable IoT logging and from there, you may be able to browse the connect events in the appropriate log group and determine why the server is rejecting the initial mqtt connect packet.

THOM-AwS commented 4 years ago

@bretambrose having setup and looked into that link you posted, I ran some requests to the service and I checked the logging. There was no unusual messages there. I did find plenty of INFO level only messages that reported Success as their status. they were both IN and OUT success. a sample of one here that I have cleansed of important numbers:

{"timestamp":"2020-11-01 03:01:29.525","logLevel":"INFO","traceId":"ba0bdb7f-06ad-f41a-c3fe-91186903aa0f","accountId":"<REDACT>","status":"Success","eventType":"Publish-In","protocol":"MQTT","topicName":"$aws/things/Alice/shadow/update","clientId":"basicShadowUpdater","principalId":"<DeletedHalfOfThis>fd215ff6134507d5f8","sourceIp":"<MYPRIVATESTATICIP>","sourcePort":44113}

There was no other messages other than INFO or any other interesting information to be had there. Again I post my policy document below that you can see my thing is an admin level user, with no caveats.

Screen Shot 2020-11-01 at 2 51 05 pm
THOM-AwS commented 3 years ago

Ive had a shot at rebuilding my Pi from scratch, and this seems to have fixed my issue. I can now get this sample to work, and it worked first try. amazing. no idea why it did not go first time. I did not change anything in the console.

bretambrose commented 3 years ago

It's great that you've gotten things working, but it's also really disconcerting that there's no clarity on the root cause.

THOM-AwS commented 3 years ago

@bretambrose I still have the sd card in tact if you want me to copy it or part of it?

github-actions[bot] commented 3 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

wes-novack commented 3 years ago

Posting here in case this helps any future searchers, I was also getting the error awscrt.exceptions.AwsCrtError: AWS_ERROR_MQTT_UNEXPECTED_HANGUP: The connection was closed unexpectedly. and for me, it turns out that I had not activated the AWS IoT thing certificate. After creating it, there's another step to activate it.

lllama commented 3 years ago

For future Google explorers: when copying the example polices from the README, make sure you change region and account to your specific values 🙄

Section009 commented 4 months ago

TO ANY FUTURE SEARCHERS INSTEAD OF USING THE TUTORIAL ON THE PAGE FOR CONNECTING THE RASPBERRY PI, USE THE "CONNECT A DEVICE" BUTTON IN AWS ON THE LEFT OF YOUR DASHBOARD