adafruit / Adafruit_CircuitPython_AWS_IOT

Amazon AWS IoT MQTT Client for CircuitPython
MIT License
13 stars 11 forks source link

AWS_IOT_ERROR ('Error Connection to AWS IoT: ', MQTTException('Repeated connect failures',)) #23

Closed mlnrt closed 5 months ago

mlnrt commented 1 year ago

Hello, 5 months ago, I had build a demo reusing the code in the PyPortal IoT Plant Monitor with AWS IoT and CircuitPython which I had modified. Everything was working fine. I was trying to recreate this demo and now the Adafruit Pyportal is not able to connect to AWS IoT. Even when trying to recreate the PyPortal IoT Plant Monitor with AWS IoT and CircuitPython demo instead of my modified demo it fails with the same error:

File "adafruit_aws_iot.py", line 145, in connect
AWS_IOT_ERROR: ('Error connecting to AWS IoT: ', MMQTTException('Repeated connect failures',))

Could it be because of this issue AWS IoT SDK Python V2 Support

Thank you in advance

jersu11 commented 6 months ago

for what it is worth, I'm having the exact same problem, exactly 1 year after this issue was created. I'm following the same demo code, getting the same 'repeated connect failures' error. It's the reason I found this issue. Did you have any luck fixing this?

justmobilize commented 6 months ago

I would suggest passing in a logger and seeing if you get any more helpful errors

justmobilize commented 5 months ago

@mlnrt and @jersu11 would you be willing to try this version of MiniMQTT? It's possible that you are getting auth errors that weren't being passed down correctly

mlnrt commented 5 months ago

@justmobilize I should be able to test in a few days

justmobilize commented 5 months ago

@justmobilize I should be able to test in a few days

Awesome!

jersu11 commented 5 months ago

Hi, I've had a chance to try with the new library tonight and I'm seeing what looks like the same error. This is with the PyPortal (M4 + ESP32) - I mention that only because I've tried to see if there is a difference connecting to wifi via the connection manager when the esp32 is a coprocessor. I don't think that's the issue. Here's the log output I get after boiling down the code to about the bare minimum

code.py output:
Connecting to WiFi...
Connected!
Attempting to connect to asxxxxxxoy.iot.us-west-2.amazonaws.com
676.379: DEBUG - Attempting to connect to MQTT broker (attempt #0)
676.381: DEBUG - Attempting to establish MQTT connection...
676.620: DEBUG - Sending CONNECT to broker...
676.622: DEBUG - Fixed Header: bytearray(b'\x10(')
676.625: DEBUG - Variable Header: bytearray(b'\x00\x04MQTT\x04\x02\x00<')
676.864: DEBUG - Receiving CONNACK packet from broker
686.886: INFO - MMQT error: No data received from broker for 10 seconds.
686.888: DEBUG - Reconnect timeout computed to 2.00
686.890: DEBUG - adding jitter 0.50 to 2.00 seconds
686.893: DEBUG - Attempting to connect to MQTT broker (attempt #1)
686.894: DEBUG - Attempting to establish MQTT connection...
686.897: DEBUG - Sleeping for 2.5 seconds due to connect back-off
689.402: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxoy.iot.us-west-2.amazonaws.com:8443
689.404: DEBUG - Resetting reconnect backoff
689.407: DEBUG - Attempting to connect to MQTT broker (attempt #0)
689.408: DEBUG - Attempting to establish MQTT connection...
689.607: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxoy.iot.us-west-2.amazonaws.com:8443
689.609: DEBUG - Resetting reconnect backoff
689.612: DEBUG - Attempting to connect to MQTT broker (attempt #0)
689.613: DEBUG - Attempting to establish MQTT connection...
689.617: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxxoy.iot.us-west-2.amazonaws.com:8443
689.822: DEBUG - Resetting reconnect backoff
689.825: DEBUG - Attempting to connect to MQTT broker (attempt #0)
689.827: DEBUG - Attempting to establish MQTT connection...
689.831: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxxoy.iot.us-west-2.amazonaws.com:8443
Traceback (most recent call last):
  File "code.py", line 160, in <module>
  File "adafruit_aws_iot.py", line 145, in connect
AWS_IOT_ERROR: ('Error connecting to AWS IoT: ', MMQTTException('Repeated connect failures',))
justmobilize commented 5 months ago

@jersu11 looks like after a timeout, it doesn't fully close out the socket. I will figure out how to reproduce this...

jersu11 commented 5 months ago

Thanks @justmobilize - let me know if I can help. I'm happy to share my code or potentially the IoT thing certs for debugging. I did spend some time digging around documentation and hacking away at the libraries. Because it seems the CONNACK message never comes back, I double checked that the cert/key added via esp.set_certificate(DEVICE_CERT) and esp.set_private_key(DEVICE_KEY) in the code.py file were correct. In the AWS console, I checked that I had attached a permissive AWS Policy to the cert and finally checked again that it was all attached to the thing and that everything was in an active state. I had been playing with the port as well - I just noticed that the log output above is for 8443, which is the HTTPS Publish only port. But, switching back to the proper MQTT Pub/Sub port of 8883 still has the exact same failures.

justmobilize commented 5 months ago

@jersu11 a few other small things to try:

  1. Have you updated the firmware on the ESP yet?
  2. When calling MQTT.MQTT.connect, what happens if you pass in a larger value (say 60) to keep_alive?
  3. Have you tried this on a more powerful chip, like a ESP32S3? Or enen on your laptop?
jersu11 commented 5 months ago

@justmobilize, good suggestions

for 1: I believe I have. when I print bytes(esp.firmware_version).decode("utf-8") it returns 1.7.7, and as far as versions go, I'm also running Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit PyPortal with samd51j20 for 3: yes, have been able to connect from my laptop with an MQTT client using the AWS cert/key pair

I've learned a bit more. I started my project with the example code, such as that found in the '/examples/aws_iot_simpletest.py' found in this library and also in the Plant Monitor code

# Set up a new MiniMQTT Client
client = MQTT.MQTT(
    broker=secrets["broker"],
    client_id=secrets["client_id"],
    socket_pool=pool,
    ssl_context=ssl_context,
)

# Initialize AWS IoT MQTT API Client
aws_iot = MQTT_CLIENT(client)

When I ran this code initially, I would get an error WARNING - Socket error when connecting: Timed out waiting for SPI char. I found that by adding port=8883 in the MQTT.MQTT client init, it fixed that error but then returned the time out errors from above.

I had been a little confused about why I needed to be explicit about the port, since within the MQTT library, MQTT_TLS_PORT should have already been set when 'is_ssl' is True. It occurred to me today that I should set is_ssl=True during the client init, and good news, that one change got it to work! At least for a moment ..

It feels like we're a lot closer to finding a solution. However, the code still lands on the same MMQTTException of 'Repeated connect failures'. I've run this with both the latest release of the adafruit_minimqtt library and a second time with your recent changes/commit to adaruit_minimqtt.py. Same result for both.

code.py output:
nina-fw version: 1.7.7
Connecting to WiFi...
Connected!
Attempting to connect to asxxxxxxxxxhoy-ats.iot.us-west-2.amazonaws.com
5945.189: DEBUG - Attempting to connect to MQTT broker (attempt #0)
5945.191: DEBUG - Attempting to establish MQTT connection...
5952.613: DEBUG - Sending CONNECT to broker...
5952.615: DEBUG - Fixed Header: bytearray(b'\x10(')
5952.619: DEBUG - Variable Header: bytearray(b'\x00\x04MQTT\x04\x02\x00<')
5952.857: DEBUG - Receiving CONNACK packet from broker
5953.078: DEBUG - Got message type: 0x20 pkt: 0x20
Connected to MQTT Broker!
Flags: 0
 RC: 0
Subscribing to topic circuitpython/aws
5953.084: DEBUG - Sending SUBSCRIBE to broker...
5953.088: DEBUG - Fixed Header: bytearray(b'\x82\x16')
5953.297: DEBUG - Variable Header: b'\x00\x01'
5953.504: DEBUG - SUBSCRIBING to topic circuitpython/aws with QoS 1
5953.508: DEBUG - payload: b'\x00\x11circuitpython/aws\x01'
5953.721: DEBUG - Got message type: 0x90 pkt: 0x90
Subscribed to circuitpython/aws with QOS level 1
5953.932: DEBUG - Sending PUBLISH
Topic: circuitpython/aws
Msg: b'{"message": "Hello from AWS IoT CircuitPython"}'                            
QoS: 1
Retain? False
5954.164: WARNING - Socket error when connecting: pystack exhausted
5954.166: DEBUG - Resetting reconnect backoff
5954.168: DEBUG - Attempting to connect to MQTT broker (attempt #0)
5954.170: DEBUG - Attempting to establish MQTT connection...
5954.174: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxhoy-ats.iot.us-west-2.amazonaws.com:8883
5954.176: DEBUG - Resetting reconnect backoff
5954.178: DEBUG - Attempting to connect to MQTT broker (attempt #0)
5954.377: DEBUG - Attempting to establish MQTT connection...
5954.381: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxhoy-ats.iot.us-west-2.amazonaws.com:8883
5954.383: DEBUG - Resetting reconnect backoff
5954.385: DEBUG - Attempting to connect to MQTT broker (attempt #0)
5954.387: DEBUG - Attempting to establish MQTT connection...
5954.391: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxhoy-ats.iot.us-west-2.amazonaws.com:8883
5954.590: DEBUG - Resetting reconnect backoff
5954.592: DEBUG - Attempting to connect to MQTT broker (attempt #0)
5954.594: DEBUG - Attempting to establish MQTT connection...
5954.598: WARNING - Socket error when connecting: Socket already connected to mqtt://asxxxxxxxxxhoy-ats.iot.us-west-2.amazonaws.com:8883
Traceback (most recent call last):
  File "code.py", line 147, in <module>
  File "adafruit_aws_iot.py", line 145, in connect
AWS_IOT_ERROR: ('Error connecting to AWS IoT: ', MMQTTException('Repeated connect failures',))
justmobilize commented 5 months ago

Oh awesome. And with that error, we can fix that.

For this: 5954.164: WARNING - Socket error when connecting: pystack exhausted, checkout this setting. Set:

CIRCUITPY_PYSTACK_SIZE=3072

In your settings.toml, And you should be good!

And I'll go look at docs. How MiniMQTT and SSL evolved, some documentation needs some help...

jersu11 commented 5 months ago

Woohoo! That pystack setting did the trick. Thanks for your commitment to this issue the last few days. I really appreciate that.

I noticed a couple other things, now that it's executing code in this library (Adafruit_CircuitPython_AWS_IOT). There's probably been a little drift between this lib and the MQTT lib. This library calls a client.loop_forever() method which doesn't exist in MQTT, and the call to client.loop() inherits the default MQTT timeout value of zero, which causes an error.

I've made the changes/fixes in my local copy, both of which were trivial. I'm happy to make a PR, which could also include the addition of the 'is_ssl=True' to the example code.

justmobilize commented 5 months ago

@jersu11 please do, although once you help, you'll become like me and want to help more... ;)

Both this one and the Azure one, can need pystack changes. Might also be a good add in the documentation

justmobilize commented 5 months ago

@jersu11 if you have time, would you be willing to open up 2 issues in MiniMQTT?

  1. pystack error keeps retrying
  2. Something about not setting is_ssl fails with timeout and non-descriptive error

This way I can focus on each and get PRs into the main library.

dhalbert commented 5 months ago

@tannewt Should we consider raising the default PYSTACK?

justmobilize commented 5 months ago

@dhalbert do you know why you would get this exception on some devices but not all?

dhalbert commented 5 months ago

Which devices have you tested that are fine and which aren't? Different boards have different PYSTACK limits, based on RAM.

justmobilize commented 5 months ago

I will let you know. I'm setting stuff up this weekend to test the PR that's being put together.

I'll test both this and AzureIoT to see which have errors and which don't.

I have pretty much all the common boards.

dhalbert commented 5 months ago

The default CIRCUITPY_PYSTACK_SIZE is 1536 for all boards except one specialized board, and MICROPY_ENABLE_PYSTACK is enabled on all CircuitPython boards, it appears. So I'm surprised it's only failing on some boards. For Espressif boards, see if there's any difference between no PSRAM and some PSRAM. Or maybe this is ESP32SPI vs native wifi.

justmobilize commented 5 months ago

Do you know which common ones have no PSRAM?

dhalbert commented 5 months ago

There are Feather ESP32-S3 boards with 8MB flash and no PSRAM, and other with 4MB flash and 2MB PSRAM.

justmobilize commented 5 months ago

Dang it. The one type of board I don't think I have...

justmobilize commented 5 months ago

@jersu11 when you have time, can you please share your code? I can't actually get the pystack error on my pyportal (or any device)

dhalbert commented 5 months ago

Those of you who are having trouble: what version of NINA-FW are you running on the PyPortal ESP32? Try updating it to the latest. New root certificates have been added. This may be a cert issue, but with poor error recovery mixed in.

jersu11 commented 5 months ago

@justmobilize , here's the code that hits the pystack error for me. It's a slightly modified version of the examples/aws_iot_simpletest.py. I've tested a few times: it works when CIRCUITPY_PYSTACK_SIZE=3072 is set in my settings.toml and fails with the pystack exhasuted message when that var is not set. I'm not sure if the PyPortal hardware has been upgraded over time. My unit is about 5 years old.

Adafruit CircuitPython 9.0.4 on 2024-04-16; Adafruit PyPortal with samd51j20

code.py.txt

mlnrt commented 5 months ago

My apologies for not being able to follow-up and test, but I ended up not having the time at all. Thank you for the follow-up. I'll try the fix.

mlnrt commented 5 months ago

It was a quite some work to update everything in my old demo and I am not fully done yet, but updating everything on my Adafruit Pyportal allows me to subscribe successfully to my MQTT topic even without the CIRCUITPY_PYSTACK_SIZE=3072 parameter in the settings.toml file. Now I still have to update my Pyportal application code to make it fully work but that is another problem I think. Thank you for the help.

justmobilize commented 5 months ago

Awesome!

Shek20 commented 4 months ago

excuse me I am having the similar issue and I struggle following what the solution is here. can you explain please what I need to do to solve the issue. @justmobilize @mlnrt `

mlnrt commented 4 months ago

@Shek20 I just updated to CiryuitPython 9 and updated the ESP32 and nina firmware to the latest

Shek20 commented 4 months ago

Thank you! got it. I was having another issue based on the SSL configuration.