Azure / iot-edge-v1

Azure IoT Edge
http://azure.github.io/iot-edge/
Other
525 stars 258 forks source link

Expose device SDK's "event_send_timeout_secs" setting in iothub module config #380

Open villepalo opened 7 years ago

villepalo commented 7 years ago

I have a simple iot-module which sends lots of data to IotHub -module. IotHub uses AMPQ protocol. During two hours connection break, my iot-module sends about 100 000 messages to IotHub module. When connection is back to normal, almost all of the messages are lost. During little shorter connection breaks (little less than one hour), all messages are sent without any problems.

Log file attached.

I think, this line could be most telling: Error: Time:Tue Sep 5 13:29:50 2017 File:/home/user/path/iot-edge/deps/iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c Func:on_amqp_connection_state_changed Line:686 Transport received an ERROR from the amqp_connection (state changed AMQP_CONNECTION_STATE_OPENED -> AMQP_CONNECTION_STATE_ERROR); it will be flagged for connection retry. Info: Transport state changed from AMQP_TRANSPORT_STATE_CONNECTED to AMQP_TRANSPORT_STATE_RECONNECTION_REQUIRED

or maybe these: Error: Time:Tue Sep 5 13:35:36 2017 File:/home/user/path/iot-edge/deps/iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c Func:IoTHubTransport_AMQP_Common_Device_DoWork Line:1046 Failed performing DoWork for device 'test-device' (device failed to start or stop within expected timeout) Error: Time:Tue Sep 5 13:35:36 2017 File: /home/user/path/iot-edge/deps/iot-sdk-c/iothub_client/src/iothubtransport_amqp_device.c Func:device_do_work Line:1019 Device 'test-device' twin messenger did not complete starting within expected timeout (60) Error: Time:Tue Sep 5 13:35:36 2017 File:/home/user/path/iot-edge/deps/iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c Func:IoTHubTransport_AMQP_Common_Device_DoWork Line:1058 Failed performing DoWork for device 'test-device' (device reported state 6; number of previous failures: 0)

bugreport.txt

villepalo commented 7 years ago

Talking to your self is sign of madness, they say and maybe they are right, but here I go again:

It seems that root cause of this is that the event_send_timeout_secs of TELEMETRY_MESSENGER_INSTANCE is set to 300 seconds (DEFAULT_EVENT_SEND_TIMEOUT_SECS). When I have something like 100 000 messages in queue azure iothub service starts to throttle incoming messsages. That's why it might take more than 5 minutes to send all of the messages to azure hub, hence timeouts. Does this make sense?

So I have to set event_send_timeout_secs to zero to prevent this kind of behavior.

darobs commented 7 years ago

Hello @villepalo

I'll be glad to provide evidence you're not just talking to yourself!

Could you provide me with some version information? A tag or commit ID would be fine. We are using the Azure IoT Device C SDK in the iothub module, and we will start working with them to figure out the problem. (Although I think you have a very good take on the problem.)

villepalo commented 7 years ago

Hi @darobs

Thanks for reply. I'm using current master, that is 231d33871f4a4d3a648c94aa7c940a053421f1ff.

I already tested that event_send_timeout_secs was the culprit of the problem. So some kind of mechanism to set event_send_timeout_secs value to zero via IotHub module is needed or then the default value should be set to zero.

This is probably just AMQP issue, not sure about it. But quick n' dirty fix below fixes the problem for me. I'm not sure if this is the way it should be handled? ` // PERSONALITY_create

if (IoTHubClient_SetOption(result->iothubHandle, "event_send_timeout_secs", &timeout)) { LogError("unable to IoTHubClient_SetOption"); IoTHubClient_Destroy(result->iothubHandle); STRING_delete(result->deviceName); STRING_delete(result->deviceKey); free(result); result = NULL; }
`