aws / aws-iot-device-sdk-python

SDK for connecting to AWS IoT from a device using Python.
Apache License 2.0
683 stars 426 forks source link

Times out when attempting to connect #337

Closed AbdullahGaw closed 4 months ago

AbdullahGaw commented 9 months ago

Describe the bug

The problem is every time i run the script it never establishes a connection.

Expected Behavior

I expect it to connect successfully or to see connection acknowledgment. or rejection within a minute at most.

Current Behavior

Looking at the logs i found that It takes 18 minutes before it rejects or acknowledges the connection. This has been a repeated pattern I ran the script more than 10 times and i get the same 18 minutes delay. please see the logs. image

I thought the IoT device had internet problem, so i tried a different sim card but same results, I tried to run this on 4 IoT devices (of the same type and settings) still same results. however when i ran it on my laptop it works. I tried to connect using paho-mqtt library and it work on all 4 IoT devices.

Reproduction Steps

def getHwAddr(ifname):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    info = fcntl.ioctl(s.fileno(), 0x8927,  struct.pack('256s', bytes(ifname, 'utf-8')[:15]))
    # return ':'.join(['%02x' % ord(char) for char in info[18:24]])
    return int.from_bytes(info[18:24], "big")

mac_addr = getHwAddr('eth')

# connection
client_identification = f"******"  # must be specified in the policy iot:Connect
end_point = "********"  # aws mqtt broker host name
port = 8883  # aws mqtt broker port number

# topics
publish_topic = f"coordinators/{hex(mac_addr)}"  # must be specified in the policy iot:Publish

# ds files path
loc = "/opt"
path = loc + "/datastreams/"

# must be up-to-date
path_to_CA = loc + "/certificates/root-CA.crt"
path_to_private_key = loc + "/certificates/private.pem.key"
path_to_cert_pem = loc + "/certificates/certificate.pem.crt"

def configure(client_id):

    client_id.configureAutoReconnectBackoffTime(1, 32, 20)
    client_id.configureOfflinePublishQueueing(-1)  # Infinite offline Publish queueing
    client_id.configureDrainingFrequency(2)  # Draining: 2 Hz
    client_id.configureConnectDisconnectTimeout(10)  # 10 sec
    client_id.configureMQTTOperationTimeout(5)

    try:
        client_id.configureEndpoint(end_point, port)
        print(GREEN + "Configured Endpoint" + RESET)

    except AWSIoTExceptions.connectTimeoutException:

        print(
            RED + "Failed to configure " + RESET + RED + BOLD + UNDERLINE + "endpoint" + RESET + RED + ", please check " + RESET + RED + BOLD + UNDERLINE + "port number and or the endpoint address" + RESET + RED + " you provided" + RESET)
        return False

    try:
        client_id.configureCredentials(path_to_CA, path_to_private_key, path_to_cert_pem)
        print(GREEN + "Configured Credentials" + RESET)

    except AWSIoTExceptions.connectTimeoutException:
        print(
            RED + "Failed to configure " + RESET + RED + BOLD + UNDERLINE + "Credentials," + RESET + RED + " please make sure you have the appropriate " + RESET + RED + BOLD + UNDERLINE + "certificates and or private key" + RESET)
        return False

    return True

try:
            client = AWSIoTMQTTClient(client_id)  # create client object
            if client:
                if configure(client):
                    try:
                        print(YELLOW + "Connecting to Client..." + RESET)
                        client.connect()
                        time.sleep(2)

                        print(
                            GREEN + "Connected to client " + UNDERLINE + BOLD + f"{client_id}" + RESET + GREEN + " successfully" + RESET)
                        while True:
                            print(YELLOW + "Collecting files..." + RESET)
                            data_file_list = sorted(os.listdir(path))
                            print(
                                YELLOW + "Found " + UNDERLINE + BOLD + f"{len(data_file_list)}" + RESET + YELLOW + "files..." + RESET)
                            if len(data_file_list) > 0:

                                print(
                                    YELLOW + "Publishing to " + RESET + BOLD + UNDERLINE + YELLOW + f"{publish_topic}" + RESET + YELLOW + "..." + RESET)
                                fetch_folder(client, data_file_list)
                                print(YELLOW + "Pausing for 5 seconds..." + RESET)
                                time.sleep(5)

                            else:
                                print(RED + "No files found in " + RESET + BOLD + UNDERLINE + RED + f"{path}" + RESET)
                                print(YELLOW + "Pausing for 2 minutes..." + RESET)
                                time.sleep(30)

except AWSIoTExceptions.connectError as e:
                        print(
                            RED + "Error: Could not " + RESET + RED + BOLD + UNDERLINE + "connect to client" + RESET + RED + ". Please check " + RESET + RED + BOLD + UNDERLINE + "client ID" + RESET)
                        print(e)
            else:
                print(
                    RED + "Error: MQTT client " + RESET + RED + BOLD + UNDERLINE + "not created" + RESET + RED + ". Please check " + RESET + RED + BOLD + UNDERLINE + "client ID" + RESET)

        except AWSIoTExceptions.ClientError as e:
            print(RED + "Client ID passed is not configured" + RESET)
            print(e)

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.5.2

Environment details (OS name and version, etc.)

Digi Accelerated Linux Operating System

jmklix commented 9 months ago

You should be able to configure the timeout with myAWSIoTMQTTClient.configureConnectDisconnectTimeout(10) which you are using. So I'm not sure why you might be seeing an 18min timeout.

  1. Can you try pinging your ats endpoint on your iot devices to make sure they can reach the IoT servers. You can do that with this:

    ping <xxxxxxxxxxxxxx>-ats.iot.us-east-1.amazonaws.com
  2. If that does work can you try using the basicPubSub sample and see if that works?

AbdullahGaw commented 9 months ago

pingshot

This is when i pinged it ^

basicshot

this is when i ran the basicpubsub.py still the 18 minutes delay

jmklix commented 9 months ago

When trying to repo this I'm always seeing the basicPubSub sample fail quickly. If I start the sample with no internet connection it fails immediately on setup with no delay. And if I disconnect the internet after a few successful publishes it will wait the 10 seconds before timing out correctly.

Is there anything special about the IoT devices (Digi Accelerated Linux Operating System) that you're using? And or the network settings that you're using?

github-actions[bot] commented 4 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.