brocaar / chirpstack-network-server

ChirpStack Network Server is an open-source LoRaWAN network-server.
https://www.chirpstack.io
MIT License
1.5k stars 546 forks source link

Class-C Downlinks being sent to 'gateway/0000000000000000/tx' with no tx details #351

Closed sa-wilson closed 6 years ago

sa-wilson commented 6 years ago

What happened?

I'm attempting to send a downlink message to a class-c device (that has joined and transmitted once quite recently) via MQTT. The app server is processing the JSON and queuing a down-link, which is then being sent immediately, but with the following JSON structure, to the gateway/0000000000000000/tx MQTT endpoint.

{"token":47484,"txInfo:{"mac":"0000000000000000","immediately":false,"frequency":0,"power":0,"dataRate":{"modulation":""},"codeRate":"","iPol":null,"board":0,"antenna":0},"phyPayload":"YA8iAPyKBgADAAgAcAMh/wARZOmWdRtygfM="}

This results in no downlink being sent to the device.

What version are your using?

lora-app-server 2.0.1 loraserver 2.0.2

LoRa App Server Log Output

Aug 28 10:19:08 loraserver.local lora-app-server[25430]: time="2018-08-28T10:19:08+10:00" level=info msg="handler/mqtt: data-down payload received" topic=application/26/device/deadc0de006500ff/tx
Aug 28 10:19:08 loraserver.local lora-app-server[25430]: time="2018-08-28T10:19:08+10:00" level=info msg="finished client unary call" grpc.code=OK grpc.method=GetNextDownlinkFCntForDevEUI grpc.service=ns.NetworkServerService grpc.time_ms=2.238 span.kind=client system=grpc
Aug 28 10:19:08 loraserver.local lora-app-server[25430]: time="2018-08-28T10:19:08+10:00" level=info msg="finished client unary call" grpc.code=OK grpc.method=CreateDeviceQueueItem grpc.service=ns.NetworkServerService grpc.time_ms=5.203 span.kind=client system=grpc
Aug 28 10:19:08 loraserver.local lora-app-server[25430]: time="2018-08-28T10:19:08+10:00" level=info msg="downlink device-queue item handled" confirmed=false dev_eui=deadc0de006500ff f_cnt=8

LoRaServer Log Output

Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetNextDownlinkFCntForDevEUI grpc.service=ns.NetworkServerService grpc.start_time="2018-08-28T10:19:08+10:00" grpc.time_ms=1.433 peer.address="[::1]:43238" span.kind=server system=grpc
Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="device-queue item created" dev_eui=deadc0de006500ff f_cnt=8
Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=CreateDeviceQueueItem grpc.service=ns.NetworkServerService grpc.start_time="2018-08-28T10:19:08+10:00" grpc.time_ms=4.783 peer.address="[::1]:43238" span.kind=server system=grpc
Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="device-queue deleted" id=47
Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="pending mac-command block set" cid=LinkADRReq commands=2 dev_eui=deadc0de006500ff
Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="backend/gateway: publishing tx packet" qos=0 topic=gateway/0000000000000000/tx
Aug 28 10:19:08 loraserver.local loraserver[3085]: time="2018-08-28T10:19:08+10:00" level=info msg="device-session saved" dev_addr=fc00220f dev_eui=deadc0de006500ff

JSON Sent to LoRa App Server

{
    "fPort": 100,
    "reference": "testtt",
    "confirmed": false,
    "object": {
        "str": "adc"
    }
}

Payload Encoder

function Encode(fPort, obj) {
  var bytes = [];
  if (fPort == 100) {
    for (var i = 0; i < obj.str.length; ++i) {
      bytes.push(obj.str.charCodeAt(i));
    }
  }
  return bytes;
}

Update:

I've been looking into the code and it seems like this may be happening in LoRaServer, during (DeviceSession) GetDownlinkGatewayMAC called by getDataTXInfoForRX2, in turn called by HandleScheduleNextQueueItem.

It seems like the gateway is not getting found, and it may be returning on line 320 (in 2.0.2 code, 322 in master) of internal/downlink/data/data.go, but not short circuiting the loop in HandleScheduleNextQueueItem, and not reporting an error in the logs.

Re-posted here as requested, instead of in lora-app-server.

brocaar commented 6 years ago

Thanks! Does this happen every time? I'll see if I can reproduce your issue today.

sa-wilson commented 6 years ago

It happens every time with this device, Re-Joining doesn't appear to do anything, and the device can be heard by 3-6 gateways.

This is my only class-c device at the moment, but I can work on getting another one up an running if you have issues reproducing the issue.

brocaar commented 6 years ago

I'm testing now with a Class-C device, but I'm unable to re-produce your issue. Could you post the output of:

loraserver --config path/to/loraserver.toml print-ds DEVEUI

# example
# loraserver --config loraserver.toml print-ds 0202020202020202
sa-wilson commented 6 years ago

I've rejoined it again, trying to test/get more logs regarding #326, but I haven't tested whether downlink messages don't work this session. I'm out of the office now, and the device is offline, so I can't re-power it until the morning.

{
    "MACVersion": "1.0.2",
    "DeviceProfileID": "df3b3ca5-7120-4b9b-98b4-560778e72f14",
    "ServiceProfileID": "9b5b2157-fc19-4399-9fb1-a439f9c95c3a",
    "RoutingProfileID": "6d5db27e-4ce2-4b2b-b5d7-91f069397978",
    "DevAddr": "fc0023d5",
    "DevEUI": "deadc0de006500ff",
    "JoinEUI": "0000000000000000",
    "FNwkSIntKey": "b65f377aaf3454d8e03e5d234c855f10",
    "SNwkSIntKey": "b65f377aaf3454d8e03e5d234c855f10",
    "NwkSEncKey": "b65f377aaf3454d8e03e5d234c855f10",
    "AppSKeyEvelope": null,
    "FCntUp": 1,
    "NFCntDown": 1,
    "AFCntDown": 0,
    "ConfFCnt": 0,
    "SkipFCntValidation": false,
    "RXWindow": 0,
    "RXDelay": 1,
    "RX1DROffset": 0,
    "RX2DR": 8,
    "RX2Frequency": 923300000,
    "TXPowerIndex": 0,
    "DR": 2,
    "ADR": false,
    "MinSupportedTXPowerIndex": 0,
    "MaxSupportedTXPowerIndex": 0,
    "MaxSupportedDR": 5,
    "NbTrans": 1,
    "EnabledChannels": null,
    "EnabledUplinkChannels": [
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        11,
        12,
        13,
        14,
        15,
        16,
        17,
        18,
        19,
        20,
        21,
        22,
        23,
        24,
        25,
        26,
        27,
        28,
        29,
        30,
        31,
        32,
        33,
        34,
        35,
        36,
        37,
        38,
        39,
        40,
        41,
        42,
        43,
        44,
        45,
        46,
        47,
        48,
        49,
        50,
        51,
        52,
        53,
        54,
        55,
        56,
        57,
        58,
        59,
        60,
        61,
        62,
        63,
        64,
        65,
        66,
        67,
        68,
        69,
        70,
        71
    ],
    "ExtraUplinkChannels": {},
    "ChannelFrequencies": null,
    "UplinkHistory": [
        {
            "FCnt": 0,
            "MaxSNR": 8,
            "TXPowerIndex": 0,
            "GatewayCount": 5
        }
    ],
    "UplinkGatewayHistory": {
        "98208edef4600103": {}
    },
    "LastDevStatusRequested": "0001-01-01T00:00:00Z",
    "LastDownlinkTX": "2018-08-28T17:17:01.48155728+10:00",
    "BeaconLocked": false,
    "PingSlotNb": 0,
    "PingSlotDR": 0,
    "PingSlotFrequency": 0,
    "RejoinRequestEnabled": false,
    "RejoinRequestMaxCountN": 0,
    "RejoinRequestMaxTimeN": 0,
    "RejoinCount0": 0,
    "PendingRejoinDeviceSession": null
}
sa-wilson commented 6 years ago

I just rejoined and tested the device again, device session details are pretty much the same except for session keys, device address, etc. Downlinks still not working. I'll try out another device soon.

brocaar commented 6 years ago

I think I have found the origin of this issue:

var scheduleNextQueueItemTasks = []func(*dataContext) error{
        ...
    checkLastDownlinkTimestamp,
    forClass(classC,
        getDataTXInfoForRX2,
    ),
    forClass(classB,
        checkBeaconLocked,
        setTXInfoForClassB,
    ),
    setRemainingPayloadSize,
        ...
}

Theoretically, a non Class-B or Class-C device should never trigger this, but somehow your TX parameters are not set, which indicates to me that nor the forClass(classC, ... or forClass(classB, ... is executed.

I'll add a case so that in case non of the above is true, a warning or error is returned.

brocaar commented 6 years ago

It could be that due to an earlier error, your device-profile cache is out of sync. Could you try the following?

From the CLI on your server:

redis-cli

Then inside the Redis console:

del lora:ns:dp:df3b3ca5-7120-4b9b-98b4-560778e72f14

So what happens:

By removing the cache from Redis, it will be re-fetched from the PostgreSQL database the next time and the device-profile should be up-to-date then.

(note that the df3b3ca..... key maps to the device-profile ID of your device).

sa-wilson commented 6 years ago

That looks like it was the issue. I've deleted the relevant key in redis and the downlinks are now working correctly.

How much of the information in redis is safe to delete and get loraserver to regenerate? I noticed that after I deleted the cached device profile that #326 didn't exhibit itself and the device joined first time (these devices rejoin after they lose power), so I'm wondering if that is related to a bad cached profile, or other cached information, as well. We had some flaky power in the past with our server, so junk may have been written at some point.

brocaar commented 6 years ago

That could have been caused it. When you would completely flush Redis, you would loose all device activations. These activations are stored in:

So when you want to clean data from Redis, but want to maintain the device-sessions, make sure you keep these keys.