Closed gerfen closed 4 years ago
Do you have the log of the LoRaWanPktFwdModule that starts at around 17:02 until 17:06? Because the LoRaWanNetworkSrvModule is sending two downstream messages to the LoRaWanPktFwdModule and would be nice to see if the LoRaWanPktFwdModule is sending them out on the radio.
Unfortunately I do not. I can recreate the issue again tomorrow morning my time and do a better job of collecting the logs.
Once rebooted it always work until the next reboot? We use the upstream message or the join to store on wich channel/freq we need to send the downstream message. We store this in memory in the gateway and in the twin of the device so when the gateway restarts it will find it again. So it could be a bug is this logic. Can you check the reported twin of the Class C device and post it here (without the keys :-))
Can you also post what is the 2 window freq etc.. for the class C so we can compare to where we send: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YIMxbgIAAgA30V/ru0Wd3RsuNpVB7ZdH","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
Yes, it seems to always work until the next reboot. Here is my redacted twin for one of the Class C devices
{
"deviceId": "1114611693080229",
"etag": "AAAAAAAAAAQ=",
"deviceEtag": "NDI3MzY0MjY3",
"status": "enabled",
"statusUpdateTime": "0001-01-01T00:00:00",
"connectionState": "Connected",
"lastActivityTime": "2019-09-05T06:55:22.3246739",
"cloudToDeviceMessageCount": 0,
"authenticationType": "sas",
"x509Thumbprint": {
"primaryThumbprint": null,
"secondaryThumbprint": null
},
"version": 975,
"properties": {
"desired": {
"AppEUI": "",
"AppKey": "",
"ClassType": "C",
"GatewayID": "evolution-gateway",
"RX2DataRate": "10",
"SensorDecoder": "http://decoder/api/DecodeUrsaLinkUc1114",
"$metadata": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4,
"AppEUI": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4
},
"AppKey": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4
},
"ClassType": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4
},
"GatewayID": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4
},
"RX2DataRate": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4
},
"SensorDecoder": {
"$lastUpdated": "2019-08-10T13:51:34.3364708Z",
"$lastUpdatedVersion": 4
}
},
"$version": 4
},
"reported": {
"AppSKey": "",
"NwkSKey": "",
"DevAddr": "023637F8",
"FCntDown": 3,
"FCntUp": 50,
"DevEUI": "1114611693080229",
"NetId": "010000",
"DevNonce": "C872",
"RX2DataRate": 10,
"Region": "US915",
"$metadata": {
"$lastUpdated": "2019-09-05T05:40:22.8806691Z",
"AppSKey": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"NwkSKey": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"DevAddr": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"FCntDown": {
"$lastUpdated": "2019-09-05T05:40:22.8806691Z"
},
"FCntUp": {
"$lastUpdated": "2019-09-05T05:40:22.8806691Z"
},
"DevEUI": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"NetId": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"DevNonce": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"RX2DataRate": {
"$lastUpdated": "2019-09-04T18:56:34.1412893Z"
},
"Region": {
"$lastUpdated": "2019-08-01T21:14:57.7356452Z"
}
},
"$version": 971
}
},
"capabilities": {
"iotEdge": false
},
"deviceScope": "ms-azure-iot-edge://*******************"
Here is how my device is configured:
Seems all right. Not sure but it could be a frame counter bug. Can you check the device fcnt down before the reboot of the gateway and then check what fcnt down we use after the reboot of the gateway. On the log this fcnt down should be higher than the one in the device: 2019-09-04 17:03:12.536 1114611693065671: [class-c] down frame counter: 6
after a join the frame counter get reset to 0 and this could explain why a re join fixes the issue.
@gerfen Would it be possible for you to provide to the following information so we can have a better picture of the problem?
Sorry for the long list :)
Sure. I'll post the results ASAP.
--Michael
From: Francisco Beltrao notifications@github.com Sent: Thursday, September 5, 2019 8:30:20 AM To: Azure/iotedge-lorawan-starterkit iotedge-lorawan-starterkit@noreply.github.com Cc: Michael Gerfen gerfen@hotmail.com; Mention mention@noreply.github.com Subject: Re: [Azure/iotedge-lorawan-starterkit] Class C devices cannot receive direct messages after gateway reboot (#214)
@gerfenhttps://github.com/gerfen Would it be possible for you to provide to the following information so we can have a better picture of the problem?
Sorry for the long list :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge-lorawan-starterkit/issues/214?email_source=notifications&email_token=ABIJFZAEAYC5QQW5F3T4XVLQIEQYZA5CNFSM4ITUNHZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD57SFDA#issuecomment-528425612, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABIJFZBJBQ6ED336PDKGZBLQIEQYZANCNFSM4ITUNHZQ.
@fbeltrao Here are the results of the message and log capture. Please let me know if there is any other info I can provide. Thanks!
@gerfen thanks! And none of the messages sent after the restart were received by the device?
@fbeltrao That's correct no messages were received by the device after the restart. Interestingly, the logs are quite a bit smaller in the non-receive case so hopefully that points to the issue. :)
@gerfen I think we have found the issue and I think we have a bug on how we apply the 2nd windows DR when we read from the twin instead of the in mem cache.
this before the reboot and works: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YPg3NgIACgA3BvCwKoDkbZkJ9p4eQE85","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW500","codr":"4/5","powe":14,"ipol":true}}
this after the reboot and does not work: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YPg3NgIADwA35UT5dTi24xWRcUzy6Fv8","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
The second one should also be SF10BW500.
We will investigate more to see what is the root cause of this. And yes this bug is US only....
@ronniesa. Thanks for the update.
--Michael
From: ronniesa notifications@github.com Sent: Sunday, September 8, 2019 11:56:45 AM To: Azure/iotedge-lorawan-starterkit iotedge-lorawan-starterkit@noreply.github.com Cc: Michael Gerfen gerfen@hotmail.com; Mention mention@noreply.github.com Subject: Re: [Azure/iotedge-lorawan-starterkit] Class C devices cannot receive direct messages after gateway reboot (#214)
@gerfenhttps://github.com/gerfen I think we have found the issue and I think we have a bug on how we apply the 2nd windows DR when we read from the twin instead of the in mem cache.
this before the reboot and works: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YPg3NgIACgA3BvCwKoDkbZkJ9p4eQE85","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW500","codr":"4/5","powe":14,"ipol":true}}
this after the reboot and does not work: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YPg3NgIADwA35UT5dTi24xWRcUzy6Fv8","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
The second one should also be SF10BW500.
We will investigate more to see what is the root cause of this. And yes this bug is US only....
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge-lorawan-starterkit/issues/214?email_source=notifications&email_token=ABIJFZHXNW4LMMBNM7SLT3LQIVDG3A5CNFSM4ITUNHZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6FW56A#issuecomment-529231608, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABIJFZBM3VHRFIJFEDXJKGDQIVDG3ANCNFSM4ITUNHZQ.
Hello @gerfen Thanks for the testing and support, we have some ideas regarding the bug, however we would need additional information to figure out exactly what is causing it. Would it possible to reproduce the issue with the network server set to debug verbosity? To do so, set the LOG_LEVEL environment variables of the network server to "Debug".
@Mandur Sure. I can do that in a couple of hours.
--Michael
From: Mandur notifications@github.com Sent: Tuesday, September 10, 2019 9:32:12 AM To: Azure/iotedge-lorawan-starterkit iotedge-lorawan-starterkit@noreply.github.com Cc: Michael Gerfen gerfen@hotmail.com; Mention mention@noreply.github.com Subject: Re: [Azure/iotedge-lorawan-starterkit] Class C devices cannot receive direct messages after gateway reboot (#214)
Hello @gerfenhttps://github.com/gerfen Thanks for the testing and support, we have some ideas regarding the bug, however we would need additional information to figure out exactly what is causing it. Would it possible to reproduce the issue with the network server set to debug verbosity? To do so, set the LOG_LEVEL environment variables of the network server to "Debug".
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge-lorawan-starterkit/issues/214?email_source=notifications&email_token=ABIJFZAPNYAZK3QNW4PYBTDQI7DYZA5CNFSM4ITUNHZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LW5ZI#issuecomment-530018021, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABIJFZDZ5LYDVOVBFH2HKB3QI7DYZANCNFSM4ITUNHZQ.
@Mandur I'm confused. It seems like I already have the log level set to Debug...
##################################
# LoRaWanNetworkSrvModule settings
##################################
NET_SRV_VERSION=1.0.7
NET_SRV_LOG_LEVEL=Debug
NET_SRV_LOGTO_HUB=false
NET_SRV_LOGTO_UDP=false
NET_SRV_LOG_TO_UDP_ADDRESS=AzureDevOpsAgent
Should I set it to some other setting, like Verbose?
Debug is correct but we know from the output where is causing the issue. Let you know once we have a quick fix
As instructed by @Mandur, I updated RX2DataRate in my twins desired properties to be an int instead of a string and rebooted my gateway. The issue remains after doing so - I see the C2D messages but the message does not reach the Class C devices. For quite a while I saw no activity from the devices in the LoRaWanNetworkSrvModule logs. I also confirmed there were no errors in the LoRaWANPktFwdModule. Then about three or four minutes later I saw this in the logs...
2019-09-25 16:21:44.147 1114611693065671: device in cache
2019-09-25 16:21:44.148 1114611693065671: device is not our device, ignore message
2019-09-25 16:21:44.148 1114611693065671: processing time: 00:00:00.0409915
2019-09-25 16:21:51.955 Physical dataUp {"rxpk":[{"tmst":562080908,"chan":3,"rfch":0,"freq":902.900000,"stat":1,"modu":"LORA","datr":"SF10BW125","codr":"4/5","lsnr":11.0,"rssi":-25,"size":25,"data":"QAQPgQIABgA3sgBQln1WINufRL0ZFa+jVw=="}]}
2019-09-25 16:21:51.955 1114611693080229: device in cache
2019-09-25 16:21:51.955 1114611693080229: device is not our device, ignore message
2019-09-25 16:21:51.958 1114611693080229: processing time: 00:00:00.0009937
I sent a C2D message to the devices and saw this in the logs...
2019-09-25 16:20:59.539 1114611693080229: received cloud to device message from direct method: {"devEUI":"1114611693080229","fport":"55","confirmed":false,"payload":"CurtainDown","MessageId":"a8225ff8fb394927b4d1b5d2f51b0b59"}
2019-09-25 16:20:59.539 1114611693080229: [class-c] down frame counter: 2
2019-09-25 16:20:59.540 1114611693080229: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YAQPgQIAAgA3Hpf2jS5TUm3WH+584qEN","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
2019-09-25 16:20:59.540 UDP: sending message with ID 8AF1, to 172.18.0.1:51504
2019-09-25 16:20:59.541 UDP: message sent with ID 8AF1
2019-09-25 16:20:59.542 UDP: packet with id 8AF1 successfully transmitted by the aggregator
2019-09-25 16:21:00.240 1114611693065671: received cloud to device message from direct method: {"devEUI":"1114611693065671","fport":"55","confirmed":false,"payload":"CurtainDown","MessageId":"f9f1ab9b00df4a3788a082e87d976f36"}
2019-09-25 16:21:00.240 1114611693065671: [class-c] down frame counter: 2
2019-09-25 16:21:00.241 1114611693065671: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YDkTpAIAAgA3lzxMQzdVf0rtaSmrfVyz","tmst":0,"size":24,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
2019-09-25 16:21:00.241 UDP: sending message with ID 9160, to 172.18.0.1:51504
2019-09-25 16:21:00.242 UDP: message sent with ID 9160
2019-09-25 16:21:00.243 UDP: packet with id 9160 successfully transmitted by the aggregator
followed shortly there after by this...
2019-09-25 16:23:02.426 Statistic: {"stat":{"time":"2019-09-25 16:23:02 GMT","rxnb":0,"rxok":0,"rxfw":0,"ackr":100.0,"dwnb":0,"txnb":0}}
2019-09-25 16:23:08.321 1114611693080229: received cloud to device message from direct method: {"devEUI":"1114611693080229","fport":"55","confirmed":false,"payload":"CurtainUp","MessageId":"9df21b9819874ea3ac7e7ac1a260d663"}
2019-09-25 16:23:08.322 1114611693080229: [class-c] down frame counter: 3
2019-09-25 16:23:08.322 1114611693080229: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YAQPgQIAAwA31gomOr4wjhLXBOaNew==","tmst":0,"size":22,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
2019-09-25 16:23:08.322 UDP: sending message with ID 94A1, to 172.18.0.1:51504
2019-09-25 16:23:08.323 UDP: message sent with ID 94A1
2019-09-25 16:23:08.324 UDP: packet with id 94A1 successfully transmitted by the aggregator
2019-09-25 16:23:09.442 1114611693065671: received cloud to device message from direct method: {"devEUI":"1114611693065671","fport":"55","confirmed":false,"payload":"CurtainUp","MessageId":"f3bf9362c6d0489b89acbef6122956ed"}
2019-09-25 16:23:09.442 1114611693065671: [class-c] down frame counter: 3
2019-09-25 16:23:09.443 1114611693065671: UnconfirmedDataDown {"txpk":{"imme":true,"data":"YDkTpAIAAwA3XXQit3m8JWpn2G4FPQ==","tmst":0,"size":22,"freq":923.3,"rfch":0,"modu":"LORA","datr":"SF10BW125","codr":"4/5","powe":14,"ipol":true}}
2019-09-25 16:23:09.443 UDP: sending message with ID 6495, to 172.18.0.1:51504
2019-09-25 16:23:09.444 UDP: message sent with ID 6495
2019-09-25 16:23:09.446 UDP: packet with id 6495 successfully transmitted by the aggregator
Recycling the power on the devices, waiting for them to rejoin and send a message allowed C2D messages to start working again.
Let me know if I can provide any more information.
@gerfen we found and fixed the bug. We are preparing a new release v1.0.2 that includes the fix and the updated IoT Edge version. Tentative date is for Monday.
@gerfen we have published 1.0.2 release and it should fix your problem. Remember to update the function too: https://github.com/Azure/iotedge-lorawan-starterkit/blob/master/Docs/upgrade.md#upgrade-lorawan-to-a-new-version
This version also fix the direct method for class C by using edge 1.0.7.
@ronniesa Thanks for the support on this issue. I meant to close it months ago. Closing now.
I am seeing an issue where my class C devices are not able to receive direct messages after I reboot my gateway. The only way I am able to get them to receive messages again is to cycle the power on the devices so that they rejoin.
I'm currently still on the 1.0 release but I am using 1.0.7 of the Azure Edge components. As a reminder I am working with US 915 frequencies.
I've included the logs I see from the LoRaWanNetworkSrvModule [1]. I don't see anything telling in the LoRaWanPktFwdModule logs.
I'm not really sure how to debug this issue further. Any advise would be greatly appreciated.
[1] LoRaWanNetworkSrvModule logs
[2] LoRaWanPktFwdModule logs