CongducPham / LowCostLoRaGw

Low-cost LoRa IoT & gateway with SX12XX (SX1261/62/68; SX1272/76/77/78/79; SX1280/81), RaspberryPI and Arduino boards
694 stars 352 forks source link

Missing packets in upload #319

Open 65sc02 opened 2 years ago

65sc02 commented 2 years ago

Dear Prof. Pham, I am so grateful for you making this single channel gateway publicly available! Great work!!

I am using it to receive messages from 20 LoRa sensor nodes that monitor irrigation in an agricultural area in the context of a German water reclamation project (https://www.nutzwasser.org ). At the moment all sensor nodes and the gateway are still being tested inside my house, so the distance is minimal and receive strength is maximal (SNR=8 RSSI=-57).

I noticed that not all transmitted packets showed up on TTN and I am in the process to narrow this down and achieve reliable communication. I looked in the log file (post-processing.log) and indeed some messages don't show up there at all. In order to isolate the problem further, I set up a second RPI with your GW-SW right next to the first one and found that sometimes messages are received on both GWs, sometimes only on one of the two GWs, sometimes not at all. If the message was received on the secondary GW, I can be sure that the message was sent (because the second GW received it), but the primary GW did not receive it. The antennas of both GWs are ca. 50cm apart, so I can assume identical RF conditions for both.

Each sensor node makes a measurement and then sends data via LoRa once per hour. In order to not block each other, each sensor node has a disjunct 1 minute time slice to send data (the actual transmission is done within 2 seconds). They are controlled by a high precision RTC (DS3231). So, during the first 20 minutes of each hour I should receive one message per minute. I am attaching log file snippets from the primary gw (nw.log) and my secondary gw (hh.log -- it does not use the MQTT feature). hh.log nw.log

They both cover the same time interval, but show different readings: hh.log: sensor_P33 at 2022-03-24T09:16:02.003938 sensor_P39 at 2022-03-24T09:18:07.264226 sensor_P41 at 2022-03-24T09:19:02.221223

nw.log: sensor_P30 at 2022-03-24T09:15:01.108945 sensor_P41 at 2022-03-24T09:19:02.899383

There should have been sensor_P30, sensor_P33, sensor_P35, sensor_P39, sensor_P41 seen at both GWs. The primary GW (nw.log) is missing sensor_P33, sensor_P35, sensor_P39 and the secondary GW (hh.log) is missing sensor_P30, sensor_P35. Both received sensor_P41.

Do you have any advice for me where to look?

Thank you so much for your help Helmut

CongducPham commented 2 years ago

Hi, thank you for your feedback. Did you tried with sensors and gw separated by more distance, like several meters apart from each other? It may happen that too short distances can cause some issues. It may also happen that there are some collisions is the clocks are not synchronized. regards,

65sc02 commented 2 years ago

Dear Pham, GWs and sensors are separated by

I can also move all the sensors further away (like 2 floors downstairs). It would be great, if this was the root cause ;-) However, it does not really explain why one gateway sees, e.g., sensor_P33, while the other GW does not see it. I would assume both would be equally overpowered. From your experience: is the observed signal strength of SNR=8 RSSI=-57 too high?

RTC clocks are synchronized (via NTP yesterday). Their drift should be way below 30 seconds per year. One can see from the time stamps in the log files that they are indeed right on time -- I can exclude collisions with high confidence. And if there would be a collision, it would again affect both GWs in the same way.

How do you communicate with the LoRa radio (in my case an RFM95W which is seen as "SX1272/76 configured as LR-BS. Waiting RF input for transparent RF-serial bridge"): via interrupt or via polling? Can I switch on some very low level debugging to see if the radio receives anything? How to do that, which file to look at?

Thanks so much Helmut

CongducPham commented 2 years ago

Distances should be ok then. I meant the synchronization at the devices. What kind of devices are you using? microcontrollers? Raspberry as well? How do you make sure that they are synchronized if you are using microcontrollers?

65sc02 commented 2 years ago

The sensor nodes are ESP32 based. They use a "Paxcounter", which has the LoRa modem/radio and the ESP32 (and a display as a bonus): http://www.lilygo.cn/prod_view.aspx?TypeId=50003&Id=1271&FId=t3:50003:3

I re-programmed it with my application program, using LMIC as the LoRaWAN software. Here is a picture of the 4 nodes with 12V power supply: image In the picture you see the round blue "Chrono Dot" which is the DS3231 RTC with backup battery.

The ESP32 also supports WiFi, so initially (or when I upload a new version of the SW using OTA update via WiFi) it has a wifi connection and sets the RTC with the proper time that it gets via NTP and WiFi. It then switches WiFi off, but I can re-enable it via a long press on my button (e.g., if I need to do a SW upgrade in the field). Without WiFi (normal operation) the ESP synchronizes its software clock once per hour from the RTC.

Bye Helmut

CongducPham commented 2 years ago

Hi, ok I see, if you are using LMIC, the normal behavior is to use 3 channels for uplink, as the gw is single channel, then you will miss 2 messages every 3 uplinks. Uplink frequencies are picked is a round robin fashion.

You can change your application code to only use one channel: see for instance https://github.com/CongducPham/LMIC_low_power/blob/0084f8adea9295bc8fa8b96c4e7818571c65707d/Arduino_LoRa_LMIC_ABP_temp/Arduino_LoRa_LMIC_ABP_temp.ino#L844

if you want OTAA with LMIC you may want to change in LMIC code so that only 1 frequency is used for join message, see for instance: https://github.com/CongducPham/LMIC_low_power/blob/0084f8adea9295bc8fa8b96c4e7818571c65707d/lmic/src/lmic/lmic.c#L695

Hope that helps,

65sc02 commented 2 years ago

Hi, ok I see, if you are using LMIC, the normal behavior is to use 3 channels for uplink, as the gw is single channel, then you will miss 2 messages every 3 uplinks. Uplink frequencies are picked is a round robin fashion.

Yes, I found out, too (the hard way ;-). I already changed the LMIC code to only use 1 frequency. The fact that the other GW receives the message also indicates that the sender is using the correct frequency. I just tested and I could get over, in quick succession: 8, then 6, and then 1 consecutive LoRa messages from the same sender. This was a good idea from you, but I really think this is not it and I got that possible pitfall covered.

if you want OTAA with LMIC

So far I have been using ABP. It is a bit more cumbersome to set up, but less complicated afterwards. For only 20 sensor nodes ABP is still practical. This way the gateway does not have to transmit at all.

Do you have any other ideas what could be going wrong? Have you ever noticed missing packets?

Bye Helmut

CongducPham commented 2 years ago

So right now I have to better idea. Yes I've noticed that some messages can be dropped sometime but if it is too frequent, it is not normal. I do have sensors sending messages every hour and could easily get all messages without any losses for weeks. regards,

65sc02 commented 2 years ago

It is definitely much more frequent here :-( So something must be wrong. I will dig into it and come back here if I find something. Thanks a lot for your help!! Talk to you later Helmut