Closed a-andreyev closed 7 years ago
Hello,
Which version of lora_gateway and packet_forwarder are you using?
I've tried actual git versions: v4.0.1 (d0226ea)
and v4.0.0 (c05eb0e)
How is the iC880A connected to the RPi? You can try to activate debug logs in the HAL (libloragw/library.cfg), by setting DEBUG_HAL to 1.
You can also test the robustness of you SPI connexion with the util_spi_stress application provided with the HAL (with -t4 option). Let it run few hours to ensure you have no errors.
Thank you for the tips!
iC880A is connected to RPi2 via SPI pins directly.
I've tried util_spi_stress
for few minutes -- no errors. Will try it with driver debug flag and -t4 option today for a longer time.
I've decided to put off the util_spi_stress
and started the debug build (DEBUG_HAL
to 1) for a several hours with real node and packet_forwarder
.
And I've found that at some moment SX1301 time (PPS)
value stopped changing. Actually it is exactly the moment when I've stopped receiving the packets by packet forwarder and seeing a large time offset according to logs.
Should I move to lora_gateway
issues thread or is it something more should I check?
hmmm, it seems that the SX1301 stops working for some reason... A similar issue has been seen some times ago and was due to a concurrent access to the SX1301 through SPI between 2 pkt fwd threads. This has been fixed by adding a mutex in the thread_timersync when getting the sx1301 counter.
Maybe running the test with DEBUG_SPI set to 1 could help, though it will be very verbose.
I have seen something similar between a packet forwarder (calling lgw_start
) and another binary using the lora_gateway (calling lgw_connect
).
In fact the second binary was resetting the sx1301 by calling lgw_connect(false, ...)
. So everything seems to be good except the page register which was set back to 0 (_in the sx1301 registers but not in the lgw_regpage
global variable as the two binaries have two different memory spaces_).
Finally, when the packet forwarder wanted to get the TIMESTAMP
register (register 70, page 2) it was reading the CORR5_DETECT_EN
(register 70, page 0). By the way, this register have a value similar to 0x7E000000 = 2113929216...
A solution to this issue can be to change the software architecture and to use a dedicated daemon to drive the LoRa stack. This is something very common. The wpa_supplicant for the Wi-Fi is an example:
The data bus used in the wpa_supplicant can be dbus or an internal control interface.
Yes, the current lora_gateway library is definitely not done to have concurrent process using it.
@jmlemetayer, thank you for the response! Am I right that if I have no other binaries that are using lora_gateway then your use-case is not applied to me?
Today I've discovered that sx1301 have stopped working and is sending2113929216
value from the moment when I've pluged my laptop in the socket in the same room and turned it on to view the logs. So looks like hardware problem for me, trying to solve it and update the status of the issue.
Thank you, @mcoracin, for your help and your project!
You're welcome. I'll close the issue for now, you can reopen it if needed.
Just a note to say: I haven't resolved the hardware issue when sx1301 stops and it's somehow connected with power sockets in my room. I've created a software patch where I'm waiting for several drift values larger than 60000ms
(separated it as constant in timersync thread) and after that restarting my packet forwarder with reset script. It's not great at all, but it works.
I also encountered a similar problem and sx1301 did not work, the PPS value read did not change, is this problem solved?
I encountered this issue.
No Rx after large time offset happened. Reboot will make the RX come back, but still no TX. Power cycle perhaps needed.
@a-andreyev Do you still have your reset script? Would you mind share it?
Thanks.
Hello, @pauldeng. Unfortunately, I don't have the script anymore.
The logic probably was to check for suspicious repeating values (SX1301 time (PPS)
) at src/lora_pkt_fwd.c
and to exit the app with an error. Then to restart it with systemd (I've used systems script to handle the startup). Not sure, this comment should describe it better, but I didn't worked with the project for a long time and don't remember the details.
Anyway, it was a hardware issue in my case, and I was able to reproduce it by adding an additional device (like a laptop) to the power socket. From friends I've heard the electrical current-related term once that could guess the effect, but I don't remember it, unfortunately (something about high impedance, or crosstalk, not sure).
Hi @a-andreyev ,
Thanks for the additional info. I will discuss this with the manufacture.
It seems very rare case as not so many people report here for years.
In my case, the chips still cannot Tx after Linux system reboot. I will check again to see if power cycle fix it.
Thanks again.
Hello! I'm using git verstion of
packet_forwarder
andlora_gateway
with DIY rpi2 (archlinux arm distro) and iC880A-based gateway (without GPS) and RFM95W-based transmitter with OTAA and private server. Looks like after random periods of time I'm receiving no packets at all and it only could be fixed via restarting the packet forwarder (my systemdlora_pkt_fwd
service also containsreset_lgw.sh
script fromlora_gateway
with custom pin). And according to logs packet loss is accompanied withtime offset
larger that60000
ms:Could you help me with resolving the issue? Should I look at
lora_gateway
code or is it looks like hardware problem?