aws / aws-iot-device-sdk-arduino-yun

SDK for connecting to AWS IoT from an Arduino Yún.
Apache License 2.0
163 stars 77 forks source link

Sketch stops updating AWS IoT after some period of time #55

Open eczajk1 opened 7 years ago

eczajk1 commented 7 years ago

I have been working with a sketch that reads from some different sensors and reports values to AWS using Thing Shadows. Things seem to work up until some point when the sketch seems to lose its ability to do updates. I have had a hard time pinpointing the issue, so I tried to use a slight modification of ThingShadowEcho example, where the loop function looks like this:

void loop() {

  if(success_connect) {

    if(myClient.yield()) {
      #ifdef AWS_IOT_DEBUG
        Serial.println(F("Yield failed."));
      #endif
    }

    char JSON_buf[150];
    String payload = "{\"state\":{\"reported\":{";
    float current_stage = (rand() % 150 + 1) / 10.0;
    payload += "\"stage\": " + String(current_stage, 1);
    payload += "}}}";
    payload.toCharArray(JSON_buf, 150);

    #ifdef AWS_IOT_DEBUG
      Serial.println("stage: " + String(current_stage, 1));
    #endif

    IoT_Error_t shadowUpdateError = myClient.shadow_update(AWS_IOT_MY_THING_NAME, JSON_buf, strlen(JSON_buf), NULL, 5);

    delay(1000);
  }
}

Things mostly work. Over the period of several hours I do get a handful (9) 'Yield failed.` errors. It is hard for me to tell from the logs, but I believe the shadow updates stopped working somewhere in the vicinity shown below.

Can you offer any pointers on why this would be happening? Am I just trying to do updates too often? If so, is there a recommended minimum interval?

DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:AWSIoTPythonSDK.core.protocol.mqttCore:Try to put a publish request 41 in the TCP stack.
DEBUG:AWSIoTPythonSDK.core.protocol.mqttCore:Publish request 41 succeeded.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:AWSIoTPythonSDK.core.shadow.deviceShadow:shadow message clientToken: centuryeng_headquarters_3_0_37_vnidc
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: SU T Size: 4
DEBUG:AWSIoTPythonSDK.core.shadow.deviceShadow:Token is in the pool. Type: accepted
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: y
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:No more messages for yield. Exiting writeToExternalYield.
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: y
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:No more messages for yield. Exiting writeToExternalYield.
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: y
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:No more messages for yield. Exiting writeToExternalYield.
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: y
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:No more messages for yield. Exiting writeToExternalYield.
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: y
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:No more messages for yield. Exiting writeToExternalYield.
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: y
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:No more messages for yield. Exiting writeToExternalYield.
DEBUG:comm.serialCommunicationServer:Clear internal list. Size: 0
DEBUG:comm.serialCommunicationServer:Accept-timer starts, with acceptTimeout: 10 second(s).
DEBUG:comm.serialCommunicationServer:1 lines to be received. Loop begins.
DEBUG:comm.serialCommunicationServer:Received: 1/1 Message is: z
DEBUG:comm.serialCommunicationServer:Finish reading from remote client. Accept-timer ends.
DEBUG:comm.serialCommunicationServer:Updated serialCommunicationServer internal protocolMessageQueue by inserting a new message. Size: 1
DEBUG:comm.serialCommunicationServer:Send through serial to remote client: Z T Size: 3
liuszeng commented 7 years ago

Hi @eczajk1 ,

Thank you very much for providing the information.

From the log you provide, it seems that there is no new incoming messages for the yield call to dispatch, which means there is no incoming messages from the subscription (shadow/non-shadow). Can you print out the received response on 32u4 from AR9331 when you are experiencing the yield failure? I would like to exclude the possibility of a serial1 communication issue.

Thanks, Liusu

eczajk1 commented 7 years ago

I am not certain what you are asking for here. Here is a snippet from the Serial monitor:

stage: 7.1
stage: 1.9
stage: 1.1
stage: 1.5
stage: 2.8
Yield failed.
stage: 4.4
stage: 4.4
stage: 12.6
stage: 0.8
stage: 3.8

The AR9331 log output pasted previously was logged using this guidance this message.

liuszeng commented 7 years ago

Hi @eczajk1 ,

What I mean is the output from the Serial1 connection from AR9331 to ATmega32u4. You can find the details instructions here: https://github.com/aws/aws-iot-device-sdk-arduino-yun/issues/47

Add a Serail.println(rw_buf) in the SDK after this line and see if there is any extra output in your Arduino IDE Serial monitor.

Basically I would like to know what gets transmitted over the Serial1 connection to ATmega32u4 that results in a yield failure.

Thanks, Liusu

eczajk1 commented 7 years ago

Thanks! But, I'm not seeing anything extra:

Y F: No messages.
stage: 14.2

Z T

Y F: No messages.
stage: 5.7

Z T

Y F: No messages.
stage: 9.1

Z T

Y F: No messages.
stage: 0.4

Z
Yield failed.
stage: 14.5

Z T

Y F: No messages.
stage: 13.3

Z T

Y F: No messages.
stage: 10.9

Z T

Y F: No messages.
stage: 13.6

Z T

Y F: No messages.
stage: 3.2
liuszeng commented 7 years ago

Hi @eczajk1 ,

Thank you very much for providing the information.

If you take a look at the lines near where "Yield failed" happens, you can find the following:

Z Yield failed. stage: 14.5

The first inter-chip protocol seems to be broken, resulting in a yield failure. SDK logic details can be found here: https://github.com/aws/aws-iot-device-sdk-arduino-yun/blob/master/AWS-IoT-Arduino-Yun-Library/aws_iot_mqtt.cpp#L373

This could be caused by byte loss in serial1 communication. Provided that yield is a frequently invoked call. What we can do here is just simply retry "contacting" the linux chip.

Apart from this, what is more interesting is that, on successful yield attempts, there is no further messages routed back to 32u4. From the logs on Python runtime that you previously attached, there is no explicit incoming messages from the cloud being captured as well. I would suspect that the client might get disconnect or the subscriptions some how get lost.

To further detect the issue, you can utilize the AWS IoT Lifecycle events to detect if the client is connected/disconnected/subscribed. I would like to confirm if there is any silent disconnect happened when you were experiencing the issue.

Thanks, Liusu

eczajk1 commented 7 years ago

What we can do here is just simply retry "contacting" the linux chip.

Are you saying that this is a technique that could be implemented in the Sketch itself?


I subscribed to the Lifecycle Events using the AWS IoT MQTT client. But, so far, I have not seen any unexpected disconnected events even while Yield failed errors are logged. I also saw this other one:

SU T

Z T

Y 1 0 JSON-267

Y F: No messages.
stage: 1.3

SUFF: Unknown error.

Z T

Y F: No
stage: 7.4
ssages.
u

SU T

Z T

Y F: No messages.
stage: 8.4
eczajk1 commented 6 years ago

I have tried listening on on Lifecycle events several times in the last week. But I am still experiencing the same issue while not seeing ANY lifecycle events showing up. I only see them when the sketch is first started.

Any advice on actions I can take to further diagnose this issue?

hedeshianaws commented 6 years ago

@eczajk1 Are you still facing this issue?

eczajk1 commented 6 years ago

Yes, I was not able to resolve it.