Closed gerfen closed 2 years ago
Based on a conversation I had with @Mandur I have rolled the edgeHub and edgeAgent bits back to 1.0.9 from 1.0.9.5. Those components started and all of my devices were able to reconnect. I'll need to wait for another 4 to 5 hours to see if the system has stabilized. If I am still facing the same issue, I'll try rolling back to 1.0.8.
Moving back to version 1.0.9 for edgeHub and edgeAgent has improved things but devices are still dropping and not reconnecting albeit at a much lower rate. I'm going to try version 1.0.8 to see if things can be improved further.
@ronniesa Do you have any other suggestions for how to resolve this issue? It's resulting in false positives. My service sends out notifications when devices are not transmitting data. We send notifications if we haven't heard form a device three consecutive times in a row. The reason we send out these notifications is to alert the Nursery operators of possible power loss. At this time of year the nursery can lose an entire greenhouse very quickly if the heat to the greenhouse is lost. The upshot of the current situation is we're waking my customs up in the middle of the night when there isn't a situation that needs to be handled. This makes for some grumpy customers. :(
The problem seems to be the deviceclient not able to renew the token and seems to be an issue with IoT Edge but we need to further test. Going a couple of releases down may solve the issue so 1.0.8 or 1.0.7.
An alternative you can set this "KeepAliveTimeout": 60 desired property twin of the lora devices. This cause the device client to be garbage collected after 60 second so it will not need to renew the token but create a brand new connection when the new message will arrive. You have a bit more latency. https://github.com/Azure/iotedge-lorawan-starterkit/blob/dev/Docs/quickstart.md#optional-device-properties
I just rolled back to version 1.0.8.5. I'll need to wait a couple of hours to see how this version performs.
Thanks for the KeepAliveTimeout suggestion. I'll try that if rolling back to 1.0.8 and 1.0.7 do not resolve the issue. I'll report my results as I learn more.
Another quick trick you can force to skip IoT Edge altogether so you can set an env variable on the LoraWanNetworkSrvModule setting in the portal ENABLE_GATEWAY to false this will force to skip IoT Edge altogether so if you are not processing messages on the Edge through routing it could be a solution.
This is not documented as we use if for test but it could be a quick fix until we discover the route cause.
Hmm. Would this impact decoding the messages? Currently I decode on the edge, read message from the IoT Hub queue and persist the readings to a database in an Azure function. Would disabling the processing at the edge still work in this scenario?
Everything should work fine in your scenario. It does not have any effect on the decoding part. Only if you route the message to another module in the the edge by changing the routing. Using this trick will allow us to determine if is a device SDK problem or an IoT Edge problem.
Sounds good. I will try this tomorrow and will report back. Thanks for the suggestion!
I had a change in plans this afternoon so I was able to try setting ENABLE_GATEWAY to false on the LoRaWanNetworkSrvModule. So far everything seems to be functional -- I'm seeing messages from my devices in my web app. I do have a question...
In the Azure IoT Hub portal I only see one connected device which is the gateway. I believe this is to be expected but I a just wanted to confirm?
I'll keep an eye on the system overnight and will report back tomorrow. Is there any thing in particular I should be looking for to help diagnose if we're facing a device SDK or IoT Edge problem?
Correct you will not see the devices connected to the edge as we are now skipping it and go directly to iot hub. If this work stable is iot edge causing the issue, if this does not work is the device sdk...
Setting ENABLE_GATEWAY to false has stabilized the system! π
It appears the issue is in the IoT edge bits. What are the downsides of using this configuration?
Thanks for reporting as this is a confirmation that the issue in is edgeHub.
The code path is basically 100% the same as ENABLE_GATEWAY only affects the connection string so everything else stay the same. As it skip edgeHub and go directly to IoT Hub the following limitations apply:
Our test pipeline do not test "ENABLE_GATEWAY" false so there is a bit of a risk but it should be minimal as the code path, as said, is the same other than the connection string.
Thanks @ronniesa. I was worried it might have an impact on sending cloud to device message. That's not super important at the moment but will be long term.
We need to fix this issue. First we will check starting from which version of edgeHub the issue arise, then as quick fix use that version and then work with the edge team to see where the problem is, as is not related to our module.
We did not catch this issue with our end to end tests because our pipeline tests run for 1h30min so not enough to see the issue. @Mandur we could add a test for this case by lowering the token validity to 30min or so but is not that straight forward to add. Ideally edge team should have a test like this.
c2d messages should not be affected by ENABLE_GATEWAY false
For the record, I tried the following edge API versions... 1.0.8.5, 1.09 and 1.0.9.5. Of those 1.0.9.5 seems to be the most broken - all devices disconnect after the 4 to 5 hours period. For versions 1.0.9 and 1.0.8.5, devices would disconnect at a much slower rate and would not reconnect after they disconnected. Prior to this I was using 1.0.7. I believe that this version was exhibiting the same behavior as 1.0.9 and 1.0.8.5.
Hopefully this info helps to narrow down the issue. Let me know if I can provide any more information.
I've got an update on this issue. Some of my devices are still disconnecting after a long period of time. In this case the time was about a little shy of 2 months.
I have tried to clear the redis cache and then reboot the gateway but the devices will not reconnect unless I manually cycle their power.
I see lots of these log messages:
2021-04-21 17:58:08.983 02194E9D: querying the registry for device
2021-04-21 17:58:09.041 02194E9D: querying the registry for devices by devAddr 02194E9D found 1 devices
2021-04-21 17:58:09.041 02194E9D: with devAddr 02194E9D check MIC failed
2021-04-21 17:58:09.041 02194E9D: processing time: 00:00:00.0591996
2021-04-21 17:58:09.971 Statistic: {"stat":{"time":"2021-04-21 17:58:09 GMT","rxnb":11,"rxok":11,"rxfw":11,"ackr":100.0,"dwnb":0,"txnb":0}}
2021-04-21 17:58:11.476 Physical dataUp {"rxpk":[{"tmst":364253044,"chan":6,"rfch":1,"freq":903.500000,"stat":1,"modu":"LORA","datr":"SF10BW125","codr":"4/5","lsnr":11.2,"rssi":-71,"size":24,"data":"QIycoQKAhQYGZ6Zv8ZC7pE4Fie+8iNdD"}]}
Let me know if I can provide any more info to help resolve the issue.
Thanks!
@Mandur - here are the edgeHub logs[1] before and after I rebooted the gateway. I'll be on-site for another three hours or so if you need me to collect any more information.
[1] edgeHub logs
<6> 2021-04-21 13:26:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 13:28:11.838 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 13:31:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 13:36:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 13:41:11.445 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 13:46:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 13:51:11.450 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 13:53:09.076 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 13:53:09.077 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 13:56:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:01:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:06:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:10:51.001 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 14:10:51.001 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 14:10:51.017 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 14:10:51.017 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 14:11:11.445 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:11:21.896 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2021-04-21 14:11:22.027 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2021-04-21 14:11:22.028 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2021-04-21 14:11:22.194 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 14:16:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:21:11.446 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:23:39.077 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 14:23:39.078 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 14:26:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:28:21.665 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 14:31:11.450 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:36:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:41:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:46:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:51:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 14:54:09.081 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 14:54:09.082 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 14:56:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:01:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:01:51.998 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 15:01:51.999 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 15:01:52.016 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 15:01:52.016 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 15:06:07.844 +00:00 [INF] - Starting compaction of stores
<6> 2021-04-21 15:06:07.844 +00:00 [INF] - Starting compaction of store DeviceScopeCache
<6> 2021-04-21 15:06:07.845 +00:00 [INF] - Starting compaction of store default
<6> 2021-04-21 15:06:07.845 +00:00 [INF] - Starting compaction of store twins
<6> 2021-04-21 15:06:07.845 +00:00 [INF] - Starting compaction of store EdgeTwin
<6> 2021-04-21 15:06:07.845 +00:00 [INF] - Starting compaction of store ProductInfo
<6> 2021-04-21 15:06:07.846 +00:00 [INF] - Starting compaction of store iothub
<6> 2021-04-21 15:06:07.846 +00:00 [INF] - Starting compaction of store checkpoints
<6> 2021-04-21 15:06:07.846 +00:00 [INF] - Starting compaction of store messages
<6> 2021-04-21 15:06:11.451 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:11:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:11:22.028 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2021-04-21 15:11:22.097 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2021-04-21 15:11:22.097 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2021-04-21 15:11:22.321 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 15:16:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:21:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:24:39.094 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 15:24:39.095 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 15:26:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:28:28.272 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 15:31:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:36:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:41:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:46:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:51:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 15:52:53.002 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 15:52:53.003 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 15:52:53.019 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 15:52:53.020 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 15:55:09.101 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 15:55:09.101 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 15:56:11.451 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:01:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:06:11.450 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:11:11.452 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:11:22.096 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2021-04-21 16:11:22.144 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2021-04-21 16:11:22.144 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2021-04-21 16:11:22.300 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 16:16:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:21:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:25:39.105 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 16:25:39.106 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 16:26:11.451 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:28:40.512 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 16:31:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:36:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:41:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:43:54.016 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 16:43:54.016 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 16:46:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:51:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 16:56:09.108 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 16:56:09.110 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 16:56:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:01:11.451 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:06:07.841 +00:00 [INF] - Starting compaction of stores
<6> 2021-04-21 17:06:07.842 +00:00 [INF] - Starting compaction of store DeviceScopeCache
<6> 2021-04-21 17:06:07.842 +00:00 [INF] - Starting compaction of store default
<6> 2021-04-21 17:06:07.843 +00:00 [INF] - Starting compaction of store twins
<6> 2021-04-21 17:06:07.843 +00:00 [INF] - Starting compaction of store EdgeTwin
<6> 2021-04-21 17:06:07.843 +00:00 [INF] - Starting compaction of store ProductInfo
<6> 2021-04-21 17:06:07.844 +00:00 [INF] - Starting compaction of store iothub
<6> 2021-04-21 17:06:07.844 +00:00 [INF] - Starting compaction of store checkpoints
<6> 2021-04-21 17:06:07.844 +00:00 [INF] - Starting compaction of store messages
<6> 2021-04-21 17:06:11.451 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:11:11.448 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:11:22.147 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2021-04-21 17:11:22.201 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2021-04-21 17:11:22.201 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2021-04-21 17:11:22.355 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:16:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:21:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:26:11.447 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:26:39.114 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 17:26:39.115 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 17:29:00.131 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 17:31:11.446 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:34:54.998 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 17:34:54.998 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:34:55.015 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 17:34:55.015 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:36:11.445 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:41:11.450 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:46:11.449 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 17:51:07.114 +00:00 [INF] - Termination requested, initiating shutdown.
<6> 2021-04-21 17:51:07.116 +00:00 [INF] - Stopping the protocol heads...
<6> 2021-04-21 17:51:07.118 +00:00 [INF] - Closing protocol heads - (AMQP)
<6> 2021-04-21 17:51:07.199 +00:00 [INF] - Closing link Events for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:51:07.211 +00:00 [INF] - Protocol heads stopped.
<6> 2021-04-21 17:51:07.217 +00:00 [INF] - Shutdown complete.
<6> 2021-04-21 17:51:07.217 +00:00 [INF] - Waiting for cleanup to finish
<6> 2021-04-21 17:51:07.217 +00:00 [INF] - Done with cleanup. Shutting down.
<6> 2021-04-21 17:51:07.231 +00:00 [INF] - Closing link TwinSending for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:51:07.232 +00:00 [INF] - Closing link TwinReceiving for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:51:07.233 +00:00 [INF] - Closing link MethodSending for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:51:07.234 +00:00 [INF] - Closing link MethodReceiving for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
2021-04-21 17:52:05 Starting Edge Hub
2021-04-21 17:52:05.902 +00:00 Edge Hub Main()
[04/21/2021 05:52:09.435 PM] Found intermediate certificates: [CN=iotedged workload ca:07/03/2021 07:05:45],[CN=Test Edge Device CA:07/03/2021 07:05:45],[CN=Test Edge Owner CA:07/03/2021 07:05:45]
<6> 2021-04-21 17:52:14.030 +00:00 [INF] - Created persistent store at /tmp/edgeHub
<6> 2021-04-21 17:52:14.272 +00:00 [INF] - Initializing Edge Hub
<6> 2021-04-21 17:52:14.273 +00:00 [INF] -
ββββββ βββββββββββ ββββββββββ ββββββββ
βββββββββββββββββββ βββββββββββββββββββ
ββββββββ βββββ βββ βββββββββββββββββ
ββββββββ βββββ βββ βββββββββββββββββ
βββ βββββββββββββββββββββββ βββββββββββ
βββ βββββββββββ βββββββ βββ βββββββββββ
βββ βββββββ βββββββββ βββββββββββββββ βββββββ ββββββββ
βββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββ
ββββββ βββ βββ ββββββ βββ ββββββ ββββββββββ
ββββββ βββ βββ ββββββ βββ ββββββ βββββββββ
ββββββββββββ βββ βββββββββββββββββββββββββββββββββ
βββ βββββββ βββ βββββββββββββββ βββββββ ββββββββ
<6> 2021-04-21 17:52:14.279 +00:00 [INF] - Version - 1.0.8.5.28262167 (b5de730c6020b852f9636b7117138de6c6fced36)
<6> 2021-04-21 17:52:14.279 +00:00 [INF] - OptimizeForPerformance=False
<6> 2021-04-21 17:52:14.290 +00:00 [INF] - Loaded server certificate with expiration date of "2021-07-03T07:05:45.0000000+00:00"
<6> 2021-04-21 17:52:14.387 +00:00 [INF] - Created new message store
<6> 2021-04-21 17:52:14.387 +00:00 [INF] - Started task to cleanup processed and stale messages
<6> 2021-04-21 17:52:15.049 +00:00 [INF] - Created DeviceConnectivityManager with connected check frequency 00:05:00 and disconnected check frequency 00:02:00
<6> 2021-04-21 17:52:16.302 +00:00 [INF] - Created device scope identities cache
<6> 2021-04-21 17:52:16.323 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 17:52:16.383 +00:00 [INF] - Initialized storing twin manager
<6> 2021-04-21 17:52:16.546 +00:00 [INF] - Initializing configuration
<6> 2021-04-21 17:52:16.634 +00:00 [INF] - New device connection for device blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:52:16.957 +00:00 [INF] - Attempting to connect to IoT Hub for client blyn-yh-gateway-south/$edgeHub via AMQP...
<6> 2021-04-21 17:52:17.329 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2021-04-21 17:52:17.429 +00:00 [INF] - Started operation Get EdgeHub config
<6> 2021-04-21 17:52:17.431 +00:00 [INF] - Initialized edge hub configuration
<6> 2021-04-21 17:52:17.484 +00:00 [INF] - Starting timer to authenticate connections with a period of 300 seconds
<6> 2021-04-21 17:52:17.705 +00:00 [INF] - Scheduling server certificate renewal for "2021-07-03T07:03:15.0004539Z".
<6> 2021-04-21 17:52:17.713 +00:00 [INF] - Starting protocol heads - (AMQP)
<6> 2021-04-21 17:52:17.731 +00:00 [INF] - Starting AMQP head
<6> 2021-04-21 17:52:18.113 +00:00 [INF] - Started AMQP head
<6> 2021-04-21 17:52:19.975 +00:00 [INF] - Exiting disconnected state
<6> 2021-04-21 17:52:19.986 +00:00 [INF] - Received device connected callback
<6> 2021-04-21 17:52:20.002 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 17:52:20.042 +00:00 [INF] - Device connected to cloud, processing subscriptions for connected clients.
<6> 2021-04-21 17:52:20.045 +00:00 [INF] - Processing subscriptions for client blyn-yh-gateway-south/$edgeHub.
<6> 2021-04-21 17:52:20.106 +00:00 [INF] - Entering connected state
<6> 2021-04-21 17:52:20.119 +00:00 [INF] - Cloud connection for blyn-yh-gateway-south/$edgeHub is True
<6> 2021-04-21 17:52:20.140 +00:00 [INF] - Connection status for blyn-yh-gateway-south/$edgeHub changed to ConnectionEstablished
<6> 2021-04-21 17:52:20.150 +00:00 [INF] - Client blyn-yh-gateway-south/$edgeHub connected to cloud, processing existing subscriptions.
<6> 2021-04-21 17:52:20.169 +00:00 [INF] - Created cloud proxy for client blyn-yh-gateway-south/$edgeHub via AMQP, with client operation timeout 20 seconds.
<6> 2021-04-21 17:52:20.203 +00:00 [INF] - Initialized cloud proxy 20edd3fa-c30c-44cd-82ac-f7b6063acb63 for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:52:20.211 +00:00 [INF] - Created cloud connection for client blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:52:20.574 +00:00 [INF] - Client blyn-yh-gateway-south/LoRaWanNetworkSrvModule in device scope authenticated locally.
<6> 2021-04-21 17:52:20.624 +00:00 [INF] - New device connection for device blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:20.689 +00:00 [INF] - Bind device proxy for device blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:20.693 +00:00 [INF] - Initialized device listener in the AMQP protocol head for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:20.727 +00:00 [INF] - Opened link Events for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:20.978 +00:00 [INF] - Opened link TwinSending for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:20.988 +00:00 [INF] - Opened link TwinReceiving for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:21.173 +00:00 [INF] - Attempting to connect to IoT Hub for client blyn-yh-gateway-south/LoRaWanNetworkSrvModule via AMQP...
<6> 2021-04-21 17:52:21.377 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:52:21.471 +00:00 [INF] - Cloud connection for blyn-yh-gateway-south/LoRaWanNetworkSrvModule is True
<6> 2021-04-21 17:52:21.471 +00:00 [INF] - Connection status for blyn-yh-gateway-south/LoRaWanNetworkSrvModule changed to ConnectionEstablished
<6> 2021-04-21 17:52:21.471 +00:00 [INF] - Client blyn-yh-gateway-south/LoRaWanNetworkSrvModule connected to cloud, processing existing subscriptions.
<6> 2021-04-21 17:52:21.471 +00:00 [INF] - Created cloud proxy for client blyn-yh-gateway-south/LoRaWanNetworkSrvModule via AMQP, with client operation timeout 20 seconds.
<6> 2021-04-21 17:52:21.472 +00:00 [INF] - Initialized cloud proxy b414f156-e1f1-4910-a7fc-0523b9f5a070 for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:21.472 +00:00 [INF] - Created cloud connection for client blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:21.657 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:52:22.048 +00:00 [INF] - Created cloud endpoint iothub with max batch size 10 and fan-out factor of 10.
<6> 2021-04-21 17:52:22.082 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2021-04-21 17:52:22.337 +00:00 [INF] - Opened link MethodSending for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:22.343 +00:00 [INF] - Opened link MethodReceiving for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 17:52:22.633 +00:00 [INF] - Set the following 1 route(s) in edge hub
<6> 2021-04-21 17:52:22.633 +00:00 [INF] - route: FROM /* INTO $upstream
<6> 2021-04-21 17:52:22.642 +00:00 [INF] - Updated message store TTL to 7200 seconds
<6> 2021-04-21 17:52:22.644 +00:00 [INF] - Updated the edge hub store and forward configuration
<6> 2021-04-21 17:52:22.649 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2021-04-21 17:52:26.817 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 17:52:51.403 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 17:57:17.509 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:02:17.485 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:07:17.488 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:12:17.485 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:17:17.485 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:22:14.416 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 18:22:14.444 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 18:22:17.488 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:27:17.485 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:32:17.485 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:37:17.484 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:42:17.486 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:43:20.021 +00:00 [INF] - New token received on the Cbs link
<6> 2021-04-21 18:43:20.024 +00:00 [INF] - Token updated for blyn-yh-gateway-south/LoRaWanNetworkSrvModule
<6> 2021-04-21 18:47:17.487 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:52:17.488 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 18:52:22.649 +00:00 [INF] - Starting periodic operation Get EdgeHub config...
<6> 2021-04-21 18:52:22.726 +00:00 [INF] - Obtained edge hub config from module twin
<6> 2021-04-21 18:52:22.740 +00:00 [INF] - Successfully completed periodic operation Get EdgeHub config
<6> 2021-04-21 18:52:22.907 +00:00 [INF] - Updated reported properties for blyn-yh-gateway-south/$edgeHub
<6> 2021-04-21 18:52:44.449 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 18:52:44.452 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 18:53:41.096 +00:00 [INF] - Starting refresh of device scope identities cache
<6> 2021-04-21 18:57:17.488 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 19:02:17.483 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 19:07:17.484 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 19:12:17.486 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 19:17:17.484 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 19:22:17.486 +00:00 [INF] - Reauthenticating connected clients
<6> 2021-04-21 19:23:14.455 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
<6> 2021-04-21 19:23:14.457 +00:00 [INF] - Cleaned up 0 messages from queue for endpoint iothub and 0 messages from message store.
<6> 2021-04-21 19:27:17.484 +00:00 [INF] - Reauthenticating connected clients
@gerfen I have some time to investigate the mic issue now. Could we have a quick sync when you have time?
Sure. Let me know the times that are convenient for you. You are 9 hours a head of me.
--Michael
From: Mandur @.> Sent: Tuesday, May 11, 2021 1:44:45 AM To: Azure/iotedge-lorawan-starterkit @.> Cc: Michael Gerfen @.>; Mention @.> Subject: Re: [Azure/iotedge-lorawan-starterkit] Version 1.0.5 - Devices disconnecting from gateway after three or four hours (#286)
@gerfenhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgerfen&data=04%7C01%7C%7C306f64148c564d6a489308d9145907f2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637563194883211263%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aGcuWONWaEklt24Pf3%2B66TpZRdFAqb9erxygNVzP5nw%3D&reserved=0 I have some time to investigate the mic issue now. Could we have a quick sync when you have time?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fiotedge-lorawan-starterkit%2Fissues%2F286%23issuecomment-838088533&data=04%7C01%7C%7C306f64148c564d6a489308d9145907f2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637563194883221248%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qP55eIb2r70atAk7yV%2Bc4uD%2BRjrAYNTohDbsR%2BIFC5M%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABIJFZGAXWW3SFHDTAAKAW3TNDU73ANCNFSM4XFOEJPA&data=04%7C01%7C%7C306f64148c564d6a489308d9145907f2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637563194883221248%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8yR2mxcp0edcda4zzqSgC2eBCpSyQWDIHhfcNToI46s%3D&reserved=0.
As discussed via IM, the mic issue faced here is by design. There is 16 bits in the payload for the frame counter, once they are exhausted we start rejecting the mic, indicating we can't process the message. Our suggestion would be to use the 32 bits frame counter by adding the Add twin desired property "Supports32BitFCnt": true to your device to allow 32 bits to be used to count frames. As indicated here
@Mandur Adding the "Supports32BitFCnt" twin property on all of my 32 bit devices and sending a direct method command to the network server on my gateway to clear the cache has allowed the majority of my devices to reconnect. I have a handful of devices which have not reconnected but they may have been inadvertently turned off or need fresh batteries.
My device tree is lighting up!
Doing a quick calculation I think it would be ~20,248 years before I see a disconnect issue for a 32 bit counter. Thanks to you and the team for chasing this issue down for me.
I'd like to leave this issue open for a week or so until we see the devices remain connected.
In general a Lora device should re join the network from time to time maybe once a month or so, this resets the fcnt on both sides. The 32bit fcnt is an opt in setting. Happy that everything is up and running.
I've recently upgraded to IoT Edge 1.2 and 1.0.6 of the iotedge-lorawan-starterkit. As part of the upgrade I set ENABLE_GATEWAY = true and unfortunately I'm still seeing my devices disconnect after three or four hours. I can live with ENABLE_GATEWAY = false but I thought I would let the team know that the issue still persists.
Please let me know what logs I can retrieve if any.
It seems the fix provided solved the issue, closing it. Feel free to reopen
I just upgraded to version 1.0.5. I used the instructions from here. After the upgrade, all of my devices reconnected and everything functioned as expected. However after three to four hours all of the devices associated to the gateway disconnected and would not reconnect until I rebooted the gateway. The devices reconnected until three to four hours later when they disconnected again.
I found that by setting the MaxConnectedClients environment variable on iotedgehub module I can get the devices to reconnect by presumably forcing the iotedge module to restart. The graph below shows the devices disconnecting and then reconnecting after I change the environment variable:
Expected Behavior
The devices should automatically reconnect without intervention.
Current Behavior
After three to four hours the devices disconnect and do not reconnect automatically.
Steps to Reproduce
Context (Environment)
Device (Host) Operating System
Ubuntu 16.04
Architecture
AMD64
Container Operating System
Linux
LoRaWAN Module Version
Docker
Logs
Looking through the logs I don't see any indication why the devices disconnect and do not reconnect. I have attached a zip file of all of the logs from the gateway prior to forcing the iotedgemodule to restart. gateway.zip
Additional Information
It feels like this might be an Azure IoT Hub issue. Please let me know if I can provide more information.