AmedeeBulle / pyrak811

RAK811 Python 3 library for Raspberry Pi
Apache License 2.0
49 stars 24 forks source link

Missing down-link messages #27

Closed kamran890 closed 2 years ago

kamran890 commented 2 years ago

Hi,

Thanks for such a great work.

Issue: Down-link messages are being missed at random. For instance, if I send 10 down-link messages from gateway, few messages will not be received at device end. If I run a separate python script to just receive the data from serial port by serial.readline() in a loop, it works and receives all data. I have looked into _read_loop in serial.py, but unable to understand what could be the issue.

Lora Module: RAK811 WisNode -> https://www.thethingsnetwork.org/docs/devices/rak811-wisnode-lora-module/ OS: Ubuntu -> https://ubuntu.com/raspberry-pi Hardware Raspberry Pi 3

Thanks

AmedeeBulle commented 2 years ago

I haven't experienced such issue, but I don't extensively use downlinks. I did P2P testing during days without loosing a message...

Which doesn't meant there is no bug in the library ;)

Can you run your code with the log level set to DEBUG? We might see if/why we drop messages.

I'll do more testing on my side but it won't be before next weekend.

kamran890 commented 2 years ago

Thanks for your reply @AmedeeBulle. Sure, I will run code with log level set to DEBUG and will update.

kamran890 commented 2 years ago

I am using OTAA mode. In an example code, I can see, a send method is being called in a loop, which in turns calls a function _process_events which is responsible for processing down-link events.

My question is if we are calling _process_events after every minute. Will it be able to recieve all down-link messages sent from gateway during that interval?

Also, let's say I don't want to send up-link message, and would like to receive only the down-links messages from gateway, then I suppose I will have to call _process_events direclty in a loop? Is it supposed to miss down-link messages if there is delay in that loop or it should receive all messages?

AmedeeBulle commented 2 years ago

Downlinks are only sent by the gateway as a reply to an uplink. If you never send uplinks you won't receive anything.

As far as I remember, in the past, you could only send one downlink per uplink. I just did a quick test from The Things Stack console and it seems that all queued downlinks are delivered when an uplink is received. If this is the case, this is the reason why you are loosing downlinks as the library expects only one (or none) - I'll verify and fix that.

On a side note, if you are using The Things Stack Community Edition, you will rapidly exceed the fair use policy which limits the downlinks to 10 per day... LoraWan is designed to handle many uplinks and few downlinks.

kamran890 commented 2 years ago

I have run the code in DEBUG mode.

Test: Send 3 Down-link messages from server to node through gateway. Verify if the 3 messages are received at node side.

Result: 2 messages are received on node side. 1 message is missed.

11 10 00 0a 06 00: Received
11 10 00 0a 05 00: Missed
11 10 00 0a 04 00: Received

Node logs:

2022-01-28 12:02:53,555 - raklib.rak811.serial - DEBUG - Received: >at+recv=10,-24,7,6:1110000a0600<
2022-01-28 12:02:53,557 - lib.transmitter_receiver - INFO - Downlink message: b'\x11\x10\x00\n\x06\x00'
2022-01-28 12:02:55,615 - raklib.rak811.serial - DEBUG - Received: >OK <
2022-01-28 12:02:55,616 - lib.transmitter_receiver - INFO - Uplink message: 0x12111000076000
2022-01-28 12:02:55,619 - raklib.rak811.serial - DEBUG - Sending: >at+send=lora:11:12111000076000\r\n<
2022-01-28 12:02:55,622 - raklib.rak811.serial - DEBUG - Received: >at+recv=0,-20,6,0<
2022-01-28 12:02:56,793 - raklib.rak811.serial - DEBUG - Received: >OK <
2022-01-28 12:02:56,799 - raklib.rak811.serial - DEBUG - Received: >at+recv=0,-20,6,0<
2022-01-28 12:02:56,801 - raklib.rak811.rak811_v3 - DEBUG - Downlink available
2022-01-28 12:02:56,802 - lib.transmitter_receiver - INFO - Downlink message: b''

2022-01-28 12:03:37,328 - raklib.rak811.serial - DEBUG - Received: >at+recv=10,-25,7,6:1110000a0400<
2022-01-28 12:03:37,330 - lib.transmitter_receiver - INFO - Downlink message: b'\x11\x10\x00\n\x04\x00'
2022-01-28 12:03:39,387 - raklib.rak811.serial - DEBUG - Received: >OK <
2022-01-28 12:03:39,388 - lib.transmitter_receiver - INFO - Uplink message: 0x12111000074000
2022-01-28 12:03:39,390 - raklib.rak811.serial - DEBUG - Sending: >at+send=lora:11:12111000074000\r\n<
2022-01-28 12:03:39,395 - raklib.rak811.serial - DEBUG - Received: >at+recv=0,-19,6,0<
2022-01-28 12:03:40,549 - raklib.rak811.serial - DEBUG - Received: >OK <
2022-01-28 12:03:40,555 - raklib.rak811.serial - DEBUG - Received: >at+recv=0,-19,5,0<
2022-01-28 12:03:40,557 - raklib.rak811.rak811_v3 - DEBUG - Downlink available
2022-01-28 12:03:40,558 - lib.transmitter_receiver - INFO - Downlink message: b''

Gateway logs:

2022/01/28 04:03:40Uplink12 11 10 00 07 40 00
2022/01/28 04:03:39Ack
2022/01/28 04:03:37Downlink11 10 00 0a 04 00

2022/01/28 04:03:07Downlink11 10 00 0a 05 00

2022/01/28 04:02:56Uplink12 11 10 00 07 60 00
2022/01/28 04:02:55Ack
2022/01/28 04:02:54Downlink11 10 00 0a 06 00
kamran890 commented 2 years ago

Following is the use case, I am working on:

Send Down-link message from server, which is a command to trigger something on node. In response of Down-link message, send an up-link message.

Server: MQTT Broker Gateway: RAK7258 which has builtin LoRa Network and Application Server Node: RAK811 wisnode

kamran890 commented 2 years ago

Here, how I am using the lib.

while 1:
    while lora_transreceiver.is_downlink():
        data = lora_transreceiver.get_downlink()["data"]
        logger.info(f"Downlink message: {data}")
        if data != b'':
            handle_downlink_data(
                codecs.encode(data, "hex").decode("utf-8")
            )
    if not uplink_buffer.empty():
        data = uplink_buffer.get()
        logger.info(f"Uplink message: {data}")
        lora_transreceiver.send(data)
    else:
        try:
            lora_transreceiver.process_events(timeout=0.1)
        except:
            pass

Note: lora_transreceiver: A wrapper built on top of pyrak811. handle_downlink_data: A function to handle the down-link and process the task based on command. uplink_buffer: A buffer which contains the response of the down-link message after performing a task.

AmedeeBulle commented 2 years ago

This confirms what I suspected, the gateway flushes all downlinks when the uplink is received and the library will only process the first one.

Will work on a fix.

AmedeeBulle commented 2 years ago

I have implemented some changes, but I am a bit puzzled by you environment.

I cannot reproduce this situation with The Things Stack: even if when queue multiple messages, they are delivered one-by-one each time an uplink is received -- this is compliant with Class A LoraWan networks.

Is the Node log above complete? The serial._read_loop runs in a separate thread and print everything it gets. If the missed message is not seen by the read loop, I doubt it has been received by the module at all... Also the line 2022-01-28 12:03:37,328 - raklib.rak811.serial - DEBUG - Received: >at+recv=10,-25,7,6:1110000a0400< is printed 40 secs after the last uplink, that doesn't sound right to me. Are you running a Class C LoraWan network?

Reading the Rak7258, it seems that by using the MQTT bridge, you are by-passing the LoRa Network Server, so might introduce non-compliant downlinks...

AmedeeBulle commented 2 years ago

In case you run your device in Class C, can you test this development build?

I have added otaa_v3_class_c.py as simple example on how to handle downlinks in this scenario.

There is still a possibility to loose downlinks: the device cannot send and receive at the same time; if a downlink is sent while the device is transmitting, the downlink might be lost, but this is how Class C works -- see An In-depth Look at LoRaWAN® Class C Devices

kamran890 commented 2 years ago

Yes, I am running the device in Class C. Sure, I will test with the rak811 0.8.5.dev5, and will update.

Thanks for sharing the link. Yes, I can see that the receive window gets closed during transmit. However, as far as I remember, I also performed testing with no up-link message, and the down-links got missed in that case too. I will test with the demo script you provided for Class C devices and will update further. Thanks!

AmedeeBulle commented 2 years ago

I never considered Class C before which explains all the troubles.

Some refactoring of the Rak811Serial class would make the code more straightforward (by recognizing downlinks upfront), but it is common to both module versions and the V2 modules have a quite different message flow...

I have made a lot of tests in Class C mode, and the library now processes all messages that the module receives. If you can confirm it works for you as well (no hurry ;) ) I'll merge the changes and release.

kamran890 commented 2 years ago

Thanks, I will test it over the weekend. Well, in-fact I think the issue may not be in processing of message but in receiving the message as you mentioned earlier. The missed messages aren't seen by read loop.

kamran890 commented 2 years ago

I have run the demo script and it is still missing down-links.

Please note, for testing, I have just ignored exception in send method where it says LoRa busy. I have also ignored exception in receive_p2p where it gets Invalid event i.e. >OK <

Following are the node/gateway logs where send function is enabled.

Node logs:

Joining
DEBUG:rak811.serial:Sending: >at+join\r\n<
DEBUG:rak811.serial:Received: >OK Join Success<
Joined in 5.29 secs
Sending initial Hello packet
DEBUG:rak811.serial:Sending: >at+send=lora:1:48656c6c6f\r\n<
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=0,-38,6,0<
DEBUG:rak811.rak811_v3:Downlink available
Packet sent in 1.35 secs
Entering wait loop
You can send downlinks from the TTN console
Waiting for downlinks...
DEBUG:rak811.serial:Received: >at+recv=10,-40,7,6:1110000a0100<
DEBUG:rak811.rak811_v3:Message available
Downlink received 1110000a0100
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=10,-35,6,6:1110000a0200<
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=10,-40,7,6:1110000a0300<
Sending back results
DEBUG:rak811.serial:Sending: >at+send=lora:1:1110000a0100\r\n<
DEBUG:rak811.serial:Received: >ERROR: 80<
lora send exception: [Errno 80] LoRa busy
Packet sent in 0.19 secs
Waiting for downlinks...
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=0,-28,6,0<
lora receive exception: [Errno 998] Invalid event received
Waiting for downlinks...
DEBUG:rak811.serial:Received: >at+recv=10,-39,5,6:1110000a0500<
DEBUG:rak811.rak811_v3:Message available
Downlink received 1110000a0500
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=10,-31,6,6:1110000a0600<
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=10,-40,6,6:1110000a0700<
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=0,-32,6,0<
Sending back results
DEBUG:rak811.serial:Sending: >at+send=lora:1:1110000a0500\r\n<
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=0,-35,6,0<
DEBUG:rak811.rak811_v3:Downlink available
Packet sent in 1.35 secs
Waiting for downlinks...

Gateway logs:

2022/02/05 16:23:41Uplink11 10 00 0a 05 00
2022/02/05 16:23:38Ack
2022/02/05 16:23:37Ack
2022/02/05 16:23:36Downlink11 10 00 0a 08 00
2022/02/05 16:23:35Downlink11 10 00 0a 07 00
2022/02/05 16:23:35Ack
2022/02/05 16:23:34Downlink11 10 00 0a 06 00
2022/02/05 16:23:33Downlink11 10 00 0a 05 00
2022/02/05 16:23:12Ack
2022/02/05 16:23:10Ack
2022/02/05 16:23:08Downlink11 10 00 0a 03 00
2022/02/05 16:23:08Ack
2022/02/05 16:23:08Downlink11 10 00 0a 02 00
2022/02/05 16:23:06Downlink11 10 00 0a 01 00
2022/02/05 16:22:59Uplink48 65 6c 6c 6f
2022/02/05 16:22:54Join

Missed message: 11 10 00 0a 08 00

Following are the node/gateway logs where send function is commented out.

Node logs:

Joining
DEBUG:rak811.serial:Sending: >at+join\r\n<
DEBUG:rak811.serial:Received: >OK Join Success<
Joined in 5.29 secs
Sending initial Hello packet
DEBUG:rak811.serial:Sending: >at+send=lora:1:48656c6c6f\r\n<
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=0,-24,6,0<
DEBUG:rak811.rak811_v3:Downlink available
Packet sent in 1.36 secs
Entering wait loop
You can send downlinks from the TTN console
Waiting for downlinks...
DEBUG:rak811.serial:Received: >at+recv=10,-44,7,6:1110000a0101<
DEBUG:rak811.rak811_v3:Message available
Downlink received 1110000a0101
Waiting for downlinks...
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=10,-36,5,6:1110000a0201<
lora receive exception: [Errno 998] Invalid event received
Waiting for downlinks...
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=10,-39,7,6:1110000a0401<
lora receive exception: [Errno 998] Invalid event received
Waiting for downlinks...
DEBUG:rak811.serial:Received: >OK <
DEBUG:rak811.serial:Received: >at+recv=0,-30,5,0<
lora receive exception: [Errno 998] Invalid event received
Waiting for downlinks...

Gateway logs:

2022/02/05 16:26:59Ack
2022/02/05 16:26:57Ack
2022/02/05 16:26:56Downlink11 10 00 0a 03 01
2022/02/05 16:26:56Downlink11 10 00 0a 04 01
2022/02/05 16:26:55Ack
2022/02/05 16:26:54Downlink11 10 00 0a 02 01
2022/02/05 16:26:54Downlink11 10 00 0a 01 01
2022/02/05 16:26:43Uplink48 65 6c 6c 6f
2022/02/05 16:26:38Join

Missed message: 11 10 00 0a 03 01

kamran890 commented 2 years ago

As we can see above, the missed message isn't even received by read loop. As per my understanding, that loop is running in a separate thread and independent of main thread. right?

AmedeeBulle commented 2 years ago

As we can see above, the missed message isn't even received by read loop. As per my understanding, that loop is running in a separate thread and independent of main thread. right?

Right, read loop is running all the time and prints all what it gets from the RAK module. If the read loop misses a message, it means that the module missed it as well.

Please note, for testing, I have just ignored exception in send method where it says LoRa busy. I have also ignored exception in receive_p2p where it gets Invalid event i.e. >OK <

Catching LoRa busy on send makes sense, the module might receive data when we try to send, I should add that in the sample code. I am a bit puzzled with the >OK < string. From your logs it seems that you get an OK reply after each at+recv. It is not the case with my module (send commented out as well):

Waiting for downlinks...
DEBUG:rak811.serial:Received: >at+recv=1,-56,8,2:3334<
DEBUG:rak811.rak811_v3:Message available
Downlink received 3334
Waiting for downlinks...
DEBUG:rak811.serial:Received: >at+recv=1,-57,8,2:3334<
DEBUG:rak811.rak811_v3:Message available
Downlink received 3334
Waiting for downlinks...
DEBUG:rak811.serial:Received: >at+recv=1,-57,8,2:3334<
DEBUG:rak811.rak811_v3:Message available
Downlink received 3334
Waiting for downlinks...

What is your module firmware version? I have:

$ rak811v3 version
V3.0.0.14.H
kamran890 commented 2 years ago

Thanks!

Firmware version: V3.0.0.14.H

Well, that >OK < was there due to the fact that I was sending the confirmed down-links. So, I realized that, with confirmed down-links, the node was sending the ACK for each down-link, thus closing the receive window, which resulted in higher probability of missing next down-link.

kamran890 commented 2 years ago

So now with unconfirmed down-links, the missing of messages would be very rare. However there can still be missing messages, because the server/gateway don't know about the status of receive window of node, and it can send a message at a time when the receive window of node is closed. Wondering how we can resolve it? Do you think a re-send feature can be a good attempt?

AmedeeBulle commented 2 years ago

The main ideas of LoRaWan are: most of the traffic is uplink and non guaranteed delivery...

Depending on what you are trying to achieve, pure LoRa in P2P mode might be preferable, but still you will have to handle packet lost.

To answer your question: you can miss a packet in multiple ways:

If you need a reliable communication, you will need to handle retries in a way or another... (As you mentioned, using confirmed transmission has its own problems, as it drastically increases airtime...)

kamran890 commented 2 years ago

Right! @AmedeeBulle Thanks for your support.

AmedeeBulle commented 2 years ago

I have released v0.8.5 with the improvements discussed in this thread. Closing the issue, feel free to reopen if I have missed anything.