Lora-net / lora_gateway

Driver/HAL to build a gateway using a concentrator board based on Semtech SX1301 multi-channel modem and SX1257/SX1255 RF transceivers.
Other
915 stars 742 forks source link

SX1301 stops receiving packets after a few minutes. #141

Closed andypandy99 closed 3 years ago

andypandy99 commented 5 years ago

Hello.

We have a Laird RG186-M2 with SX1301 attached to a linux pie. We have over 50 of these out in the field and 1 of these units are hanging from time to time with a period of every 3-4 minute. We receive no error what so ever when we continuously run "lgw_receive()" and the module responds with 0 packets and NO error. Within these 3-4 minutes we get packets as usual. No faulty packets or anything is reported so we have nothing to go on... The concentrator stucks in this mode and only a reset seems to get it online again.

We have switched concentrators from working sites that has been working nonstop over 1 year and these concentrators also hangs at this site. We also placed the concentrator at another site and it is not hanging there. We strongly believe that some other radio-system is blocking or hanging our concentrator but has no control over this hardware. At the site, there is a system called Minola but i dont think that system uses LoRa, however it might be using the 868MHz band.

One thing we have noticed after hours of searching is that the SX1301-register "LGW_MODEM_STATUS" (#define LGW_MODEM_STATUS 274) is switching from 0 to 1 when it locks up. We can read all the registers so the SPI seems to work. We have no idea what this register means but the only thing that makes the module work again is to reset the module by hardware-reset. We do this by a reset-pin on the RG186-M2.

Questions: How can the module just stop receiving packets? What is the modem-status-register indicating? Can we increase any buffers or anything so the SX1301 does not hang. Area there others out there who has experienced this before?

Thanks!

mcoracin commented 5 years ago

Hello, So far, we only have seen a similar issue under very high traffic load, with long payload, while running performances testing. But this should never occur in a normal usage case.

The modem status register is just a debug register indicating that the modem_0 of the sx1302 has detected a LoRa signal. When the problem occurs, you say that the LGW_MODEM_STATUS register is switching from 0 to 1. Does it switch back to 0 or does it keep being set to 1 until you reset ? There is no buffer to be increased. But it is interesting to understand the environment in which it happens, as you say that it is site dependant. Would it be possible for you to make a spectral scan on this site ? How is the LoRa traffic on this site ? Is it particularly heavy ?

Also, when the problem occurs, could you read the register LGW_DATA_MGMT_STATUS ? ◦ bit LSB+0: indicates if the buffer memory is full ◦ bit LSB+1: indicates if an override of the FIFO has been detected ◦ bit LSB+2: indicates if an overflow of the FIFO has been detected ◦ bit LSB+3: indicates if the buffer memory is not empty

Best regards, Michael

andypandy99 commented 5 years ago

Hi.

The modem status does not switch back to 0. It is stuck on 1 until reset. I will look into if we have the possibility to do a spectral scan. The LoRa traffic on the site is very low. We have approx. 60 end nodes which sends 30 byte-packet every 3 hours. We use our own private network (not a public LoRa radio) but in the logs we see other LoRa packets and we have about a total of 1 packet every other minute. The other LoRa-packets are longer and are sometimes as long as 240 bytes. I would say that this site is one of our most silent LoRa sites.

I will firmware update our gateway so we can get the LGW_DATA_MGMT_STATUS. I will get back with that data A.S.A.P.

Question, if the MGMT_STATUS indicates buffer full or one of the other states, can we clear it somehow without a reset?

Thanks for the reply!

Best regards Andreas

mcoracin commented 5 years ago

Hi Andreas,

You can give a try to the attached test patch to see if you encounter a FIFO lock issue. What it does is, it checks on the LGW_DATA_MGMT_STATUS register, and uses a trick to free the modem without resetting the board by making LoRa packet undetectable for a moment, with the LGW_CORR_NUM_SAME_PEAK setting. The get back to the default setting when it is unlocked.

buffer_full_test.patch.txt

Best regards, Michael

andypandy99 commented 5 years ago

Hello.

We have now tested the patch along with debugging of the DATA_MGMT_STATUS-reg. The patch works so now we clear the FIFO instead av rebooting the whole module. Thanks for that! Saves us a couple of seconds.

In a period of 30 minutes we have about 15 FIFO-errors and they all look the same (override and overflow). One of fifteen debugs: 2019-06-28 11:25:29.770: LORA: lgw_receive:1224: ERROR: SX1301 FIFO LOCKED: 0x6 (mem_full:0, mem_not_empty:0, fifo_override:1 fifo_overflow:1) In this period we had 6 verified Lora-packets and 1 of those where our own packet. 1 packet (not ours) had a MIC-checksum error.

We will try to do a spectral scan at the site as soon as possible to find anything useful.

What options do we have? Disable specific channels? Wait for a better SX1301? Disable or filter anything else? What was the solution on the issue you mentioned under high traffic?

Best regards Andreas

rbaldwin13 commented 3 years ago

In an effort to improve our customer support experience and in recognition that our support backlog on GitHub has historically exceeded the capacity of our engineering team, we have taken the difficult decision to focus on the most contemporary issues reported and to close all others without confirmation of resolution.

Our belief is that issues which have remained unresolved and unaltered for extended periods of time are less likely to continue to pose a significant problem to the user than when they were originally filed. More contemporary issues however may still be relevant and hence are more appropriate to prioritize.

For those users who remain interested in resolution of a reported issue that was closed, we are encouraging usage of our developer portal forums [https://forum.lora-developers.semtech.com/] and commercial support portal [https://semtech.force.com/ldp/ldp_support] as the preferred avenues to receive support. We will continue to monitor the GitHub issue trackers as well, but want to encourage all users to take advantage of the increased community presence on the developer portal. For commercial customers, we highly recommend using the commercial support portal which is uniquely tailored to service such support requests.

RuiMPCosta commented 7 months ago

Hello, I am using SX1301 and i have a similar issue. My gateway stops receiving packets and i receive the error ERROR: SX1301 FIFO LOCKED: 0x6 (mem_full:0, mem_not_empty:0, fifo_override:1 fifo_overflow:1) INFO: SX1301 FIFO UNLOCKED ERROR: SX1301 FIFO LOCK DETECTED However, when this errors comes, the SX1301 disables the LDO like Radio Enable A etc. THe board that i am using is this: https://ww1.microchip.com/downloads/en/DeviceDoc/40001827A.I am only using the radio gateway board with raspberry pi. I am also using my own network and in particular i am using only the Rx test. The start up can be seen here:

SX1301_CONFIG