aws / aws-iot-device-sdk-embedded-C

SDK for connecting to AWS IoT from a device using embedded C.
MIT License
977 stars 628 forks source link

Facing issue in connecting to the AWS server/broker. aws_iot_mqtt_connect() function returns NETWORK_ALREADY_CONNECTED_ERROR #1876

Closed SwapnaBaviskarEmr closed 9 months ago

SwapnaBaviskarEmr commented 1 year ago

We are using the aws-iot-device-sdk-embedded-C stack and facing issue in connecting to the AWS server/broker. Following are the details: 1) The iot device i.e. thermostat connects with the AWS server over MQTT over TLS protocol (port : 8883). 2) On first powerup, the thermostat connects to the AWS server / broker with the bootstrap certificate. Then it receives the production certificate from the AWS server. 3) After which the device connects to the server using the production certificate.

Now, following is the scenario where in we are observing the thermostat device is not able to connect to the AWS server:

1) Connected the thermostat to the wi-fi router which is outside Emerson network i.e. outside of Firewall. Communication to the AWS server is established. Everything is working fine. image

2) Disconnect the thermostat from Router 1 and connect to Router 2 which is in Emerson network and has firewall which prevents it from communicating to the URL - mqtt.sensiapi.io. So basically Router 2 prevents the thermostat to connect to mqtt.sensiapi.io. over TLS over MQTT port 8883.

image

3) After connecting to the Router 2, after 5 connection attempts to the AWS server, the production certificate is deleted in the thermostat. Now the thermostat does not have the production certificate hence it will try to connect with the bootstrap certificate. But since the network has firewall it will not connect to the server. 4) Next step, we disconnect the thermostat from Router 2 (network with firewall) and reconnect to Router 1 (which is not in Emerson Firewall), Now we expect the thermostat to connect to AWS server (mqtt.sensiapi.io. over TLS over MQTT port 8883). 5) When we change to the network without firewall and try connecting to the server, it is observed that the client connected state in the AWS SDK is always CLIENT_STATE_CONNECTING which inhibits the thermostat from trying to reconnect to the server.
Please find herewith attached debug log file – DebugLog_Network_Already_Connected_Error_29Aug2023.doc Please see below snippet from the debug log (search for NETWORK_ALREADY_CONNECTED_ERROR). **backoff_state(BACKOFF_CONNECT_ATTEMPT) backoff_state:618-> *ticks(875315) >= next_ticks(0) + connect_tick_count(0)***backoff_state:623-> **Increment connect_attempt = 1aws_iot_mqtt_connect:466-> *clientState = 2****print_client_state:511->**CLIENT_STATE_CONNECTING* _aws_iot_mqtt_is_client_state_valid_for_connect:354-><LF> *isValid = false aws_iot_mqtt_connect:474-> NETWORK_ALREADY_CONNECTED_ERROR**

In the above log, when the aws_iot_mqtt_connect() function is called ,it checks the client state and if it finds the client state to be either of the below mentioned states it will not attempt further for connection to the MQTT broker. a)CLIENT_STATE_CONNECTING, b) CLIENT_STATE_CONNECTED_IDLE, c)CLIENT_STATE_CONNECTED_YIELD_IN_PROGRESS, d)CLIENT_STATE_CONNECTED_PUBLISH_IN_PROGRESS, e)CLIENT_STATE_CONNECTED_SUBSCRIBE_IN_PROGRESS, f)CLIENT_STATE_CONNECTED_UNSUBSCRIBE_IN_PROGRESS, g)CLIENT_STATE_CONNECTED_RESUBSCRIBE_IN_PROGRESS, h)CLIENT_STATE_CONNECTED_WAIT_FOR_CB_RETURN, i)CLIENT_STATE_DISCONNECTING

As per our understading from log, the client state is always in CLIENT_STATE_CONNECTING. And the aws_iot_mqtt_connect() function returns NETWORK_ALREADY_CONNECTED_ERROR and doesnot go further for connection with the broker/server.

Below is the code snippet in the AWS SDK: aws_iot_mqtt_connect

image

Request you to look into this . Request you to let us know what could be causing this and how to resolve the issue.

Thank you.

Logfile - DebugLog_Network_Already_Connected_Error_29Aug2023.docx

Thanks and Regards, Swapna

AniruddhaKanhere commented 1 year ago

Hello @SwapnaBaviskarEmr,

Thank you for reaching out to us. Also thanks for giving us proper context and details. That will help us help you.

Yes, you have correct diagnosis. If the state is CLIENT_STATE_CONNECTING, then the aws_iot_mqtt_connect does not send another connect packet.

Seeing that there are multiple connects and disconnects, I am curious about which APIs you call when your device detects a network disconnect. Because the MQTT library might not know that the network was disconnected after it tried to send a CONNECT packet to the broker when connected to the router within firewall and when it was connected to the router outside of the firewall.

I am not entirely sure as of now about the root cause as this is an old piece of code which pre-dates me.

I suggest that you try setting the state yourself manually before calling aws_iot_mqtt_connect. That can be done like so:

aws_iot_mqtt_set_client_state( pClient,
                               CLIENT_STATE_CONNECTING,
                               CLIENT_STATE_INITIALIZED );

/* Call aws_iot_mqtt_connect now... */

This is not a very clean way of doing things but to get around the problem for now, can you try this?

The correct way might be to call aws_iot_mqtt_attempt_reconnect when trying to reconnect with the broker. Again, not entirely sure as this is a bit old code which I am trying to understand as we go.

Let me know if the above solution(s) resolve the issue you are seeing.

Thanks, Aniruddha

P.S. if possible, I strongly suggest updating to the latest version of the codebase. That way, you shall get all the updates and bug fixes that come with the latest code.

SwapnaBaviskarEmr commented 1 year ago

Hello @AniruddhaKanhere

Thank you so much for the suggestions: As per your suggestions have tried the following: 1) Tried setting the state manually to CLIENT_STATE_INITIALIZED by using aws_iot_mqtt_set_client_state( pClient, CLIENT_STATE_CONNECTING, CLIENT_STATE_INITIALIZED ); After calling the aws_iot_mqtt_connect(), if it returns the error NETWORK_ALREADY_CONNECTED_ERROR then Client state is forcefully set to CLIENT_STATE_INITIALIZED Following change is made in the code under the compile option INIT_CLIENT_STATE.

image

After making the above change to change the client state to CLIENT_STATE_INITIALIZED and then calling aws_iot_mqtt_connect() in the next connect attempt the thermostat gets connected AWS broker/URL. This solution has worked (although we have not tested this thoroughly yet). The thermostat is able to connect to the AWS server/broker with this change. As you mentioned this is not a very clean way of doing things as we are forcefully trying to change the client state. Have tried the second suggestion.

2) The correct way might be to call aws_iot_mqtt_attempt_reconnect when trying to reconnect with the broker. Have tried to call the aws_iot_mqtt_attempt_reconnect() function as follows in the code. If the aws_iot_mqtt_connect() returns the NETWORK_ALREADY_CONNECTED_ERROR, then the aws_iot_mqtt_attempt_reconnect() is called in the code as follows:

image

However with this change the device is not able to connect to the AWS server/broker.

The aws_iot_mqtt_attempt_reconnect() function returns error code 4( NETWORK_ATTEMPTING_RECONNECT )

Please see below Log : *Call RECONNECT aws_iot_mqtt_attempt_reconnect function**aws_iot_mqtt_attempt_reconnect:651-> *aws_iot_mqtt_attempt_reconnect called* aws_iot_mqtt_connect:466-> ***clientState = 2print_client_state:512->CLIENT_STATE_CONNECTING* _aws_iot_mqtt_is_client_state_valid_for_connect:354-> ***isValid = false aws_iot_mqtt_connect:474-> NETWORK_ALREADY_CONNECTED_ERROR mqtt_adapter_connect:781-> *FAILED : aws_iot_mqtt_attempt_reconnect error rc1 = 4

Please find herewith the attachment of the Log file - DebugLogfile_mqtt_reconnect_4Sept2023.doc. Please search for Call RECONNECT aws_iot_mqtt_attempt_reconnect function in the log file. DebugLog_mqtt_reconnect_4Sept2023.docx

Can you please guide as to why would the function aws_iot_mqtt_attempt_reconnect would return error code 4. We are calling the reconnect function if the aws_iot_mqtt_connect() function is not successful and returns the NETWORK_ALREADY_CONNECTED_ERROR. Have the following questions: Is it ok to call aws_iot_mqtt_attempt_reconnect() immediately, if the aws_iot_mqtt_connect() is not successfull.? When should the aws_iot_mqtt_attempt_reconnect() function be called.? What would be the expected return code from the aws_iot_mqtt_attempt_reconnect(). ?

Request you to guide us as to how we can use aws_iot_mqtt_attempt_reconnect() successfully.

Thanks for your guidance.

Thanks and Regards, Swapna.

AniruddhaKanhere commented 1 year ago

Hello Swapna,

After studying the code a bit more, I think you should go with my first solution as the second one will not work at all.

The second solution I suggested seems good on a preliminary look, but the APIs have some issues. If you call aws_iot_mqtt_attempt_reconnect, it internally calls aws_iot_mqtt_connect and thus you get the error saying that NETWORK_ATTEMPTING_RECONNECT. I also looked at the case when you should call aws_iot_mqtt_disconnect before calling aws_iot_mqtt_attempt_reconnect. But that has a similar issue. The disconnect function will not do anything as the state is CLIENT_STATE_CONNECTING which implies that the device is not yet connected.

Thus, I suggest you stick to the 1st solution that I proposed above. While not clean, it will work and should not cause any race conditions as it acquires a mutex before changing the value of the state. To harden the code a bit, I suggest checking the return value of the function aws_iot_mqtt_set_client_state to make sure that it is SUCCESS. That way you should be able to spot any corner cases which we may have missed.

Once more, I strongly suggest that you switch to the latest code base whenever possible. I understand that if this is being used in a product, it may not be easy to switch. But here is some documentation of the new MQTT library coreMQTT for your reference: https://freertos.org/Documentation/api-ref/coreMQTT/docs/doxygen/output/html/index.html. We will help you through the transition should you need any help.

Regards, Aniruddha

SwapnaBaviskarEmr commented 1 year ago

Hello Aniruddha,

Thank you for the clarification on the aws_iot_mqtt_attempt_reconnect() and also sending the link for the documentation on the new library. Regarding changing the stack to the latest , would like to let you know that this code is in production now, hence it would be difficult and time consuming at this stage to make this major change immediately. In near future when we get a chance to do major feature changes in the code , we will definitely consider to switch to the latest SDK.

Currently as you have suggested a quick workaround would be good.
We will use the 1st solution to change the client state to CLIENT_STATE_INITIALIZED and will also check for the return value from the function aws_iot_mqtt_set_client_state().

I have a question on the following: You have mentioned : “Thus, I suggest you stick to the 1st solution that I proposed above. While not clean, it will work and should not cause any race conditions as it acquires a mutex before changing the value of the state”.

I have just checked in the below code (aws_iot_mqtt_client.c) the compile option ENABLE_THREAD_SUPPORT is not enabled.

image

Hence The function aws_iot_mqtt_client_lock_mutex() will not be called before changing the client_state. Would this cause a problem? Does the AWS stack absolutely requires to enable the _ENABLE_THREAD_SUPPORT ?.

Please let me know your thoughts , if it would be cause an issue if the _ENABLE_THREADSUPPORT is not enabled and mutex not acquired before changing the client state.

Thanks for your help and guidance.

Thanks and Regards, Swapna. readme.md

AniruddhaKanhere commented 1 year ago

Hello Swapna,

If you are calling all the APIs from a single thread/task, then you need not worry about the threading issue. But if you are calling these APIs from different threads, then there might be a cause of concern.

Can you give me a pseudo code on how you do these steps using the APIs?

connect to broker

subscribe to topic

publish to a different topic

receive an incoming publish

Do not share your business logic, I just need to see how you do all these things and to know whether you do these from a single thread or not so that I can better advise you.

Regards, Aniruddha

SwapnaBaviskarEmr commented 1 year ago

Hello Aniruddha,

As per my understanding currently the api's are called from the single task mqtt_adapter. will check further and also provide psuedo code for your review.

Thanks and Regards, Swapna.

SwapnaBaviskarEmr commented 1 year ago

Hello Aniruddha,

Actually i have found another issue regarding the max number of topics that are subscribed from the application is currently 10 and while switching from the firewalled network (router 2) to non firewalled network(router 1) , there are 4 registration topics that are subscribed too , so we will need 4 additional topics in this case. I have changed the MAX TOPICS in the code from 10 to 14. After changing the MAX_TOPICS in application code from 10 to 14 the thermostat is able to connect to the server.

Have a question when we change the MAX TOPICS in the application code from 10 to 14 , do we need to change the following define in the aws SDK code file aws_iot_config.h. #define AWS_IOT_MQTT_NUM_SUBSCRIBE_HANDLERS 10 ///< Maximum number of topic filters the MQTT client can handle at any given time. This should be increased appropriately when using Thing Shadow Should we change the above define to 14. #define AWS_IOT_MQTT_NUM_SUBSCRIBE_HANDLERS 14 ///< Maximum number of topic filters the MQTT client can handle at any given time. This should be increased appropriately when using Thing Shadow .

Please let me know.

Thanks for your help.

Thanks and Regards, Swapna.

AniruddhaKanhere commented 1 year ago

Hello Swapna,

Yes, if you are subscribing to 14 topics, then you would need to increase the AWS_IOT_MQTT_NUM_SUBSCRIBE_HANDLERS to 14 as well. But, if you are just publishing (and NOT subscribing) to those 14 topics that you mentioned above, then you need not increase this.

Keep in mind that this will take up additional memory (flash). Conservatively ~20bytes per additional entry in that array.

Let me know if that helps.

Regards, Aniruddha

ActoryOu commented 11 months ago

Hi @SwapnaBaviskarEmr, It's been a while since last post. I'd like to know if @AniruddhaKanhere's comment help you. Let us know if anything we can help.

Thanks.

joshzarr commented 9 months ago

Hi @SwapnaBaviskarEmr As there was no response I am closing out the issue, please feel free to open again if the problem is not resolved.