Lora-net / LoRaMac-node

Reference implementation and documentation of a LoRa network node.
Other
1.88k stars 1.09k forks source link

Radio stuck in RF_RX_RUNNING after first uplink #308

Closed finnoliver closed 6 years ago

finnoliver commented 7 years ago

Hi,

I am having a problem with the sx1276 seeming to be stuck in RF_RX_RUNNING. I am thinking that this might be caused by timing issues. All of this results in LoRaMAC beeing stuck in LORAMAC_TX_RUNNING and no further packages beeing sent.

So my question is: In what orders the timers are supposes to expire? At the moment I seem to have: OnRxWindow1TimerEvent->OnRxWindow2TimerEvent->OnRxTimeout with RxWindowSetup(...) in OnRxWindow2TimerEvent returning false

Any thoughts?

djaeckle commented 7 years ago

Hi finnoliver,

thanks for the report. At first, I would double check that your radio is connected to the MCU properly. If the radio driver stays in RF_RX_RUNNING, it seems like it has missed an IRQ. In LoRa mode, you should get either the DIO0 or DIO1 interrupt.

Is your node transmitting the first frame correctly?

finnoliver commented 7 years ago

Hi djaeckle, thanks for your answer. Yes, I double checked the connections, verified that the mcu goes into the correct interupt by simulating high level on the pins and whatched the DIO Outputs on the oszilloscope during operation. Looking at the code, I was wondering whether there should be an RxTimeout on DIO1 after the transmission of the data frame when no downlink is received (which in fact is transmited correctly)?

The programm seems to get stuck right after RX1Window in OnRxWindow2TimerEvent(). In RxWindowSetup() the stack calls the radio status, which leads to RxSlot staying "0" and therefore LoRaMacFlags.Bits.MacDone never going "1" in OnRadioRxTimeout().

finnoliver commented 7 years ago

Should there be an RxTimeout Interrupt by on DIOx after RX1 ?

djaeckle commented 7 years ago

Hi finnoliver,

yes. In case you are want to receive in LoRa mode the radio should provide an RX timeout with an interrupt on DIO1, if the sync timeout elapsed.

mirzafahad commented 7 years ago

Doesn't this (https://github.com/Lora-net/LoRaMac-node/blob/develop/src/mac/LoRaMac.c#L1175) solve the problem?

ericAtlantis commented 6 years ago

Hi djaeckle I have the same problem with finnoliver, but the difference is that i have successfully completed the launch, but also received DIO0 response also entered the TxDon, but I LoRaMAC state is stuck in LORAMAC_TX_RUNNING, resulting in I can not go to the next step received Back to the information, would like to ask where should I look for answers?

mluis1 commented 6 years ago

@ericAtlantis Have you been able to solve your issue? Can we close the issue?

ericAtlantis commented 6 years ago

Hi mluis1 My LoRaMAC status LORAMAC_TX_RUNNING status has been resolved, but I can not process the data to RxDon because it reflects the DIO1 status (Rxtimeout) ....

mluis1 commented 6 years ago

@ericAtlantis

Have you checked that the timings are correct. If the reception windows aren't aligned with the transmission from the server then you will always get RxTimeout interrupts.

Something that you could try is to increase the tolerated timing error by adding the below code to your MAC layer initialization code. By default the MAC implementation tolerates +/-10 ms error on the system timer. You may try to increase this value to 100 ms and check if it works and then try to reduce the value up until it doesn't work anymore.

                mibReq.Type = MIB_SYSTEM_MAX_RX_ERROR;
                mibReq.Param.SystemMaxRxError = 100;
                LoRaMacMibSetRequestConfirm( &mibReq );

Once you have a working system you may need to improve the precision of your RTC drivers implementation

As we don't know which modifications and how you have done them it is very difficult to help you. The provided information isn't enough to understand what the potential issue is.

Our advise would be that you verify if all the hardware connections are done correctly. You could also try to compare your platform modifications against on of the platforms provide by this project.

aplastiras commented 6 years ago

Hi everyone,

the reason that I write in this post is that I have exactly the same problem as @finnoliver has.

I use the B-L072Z-LRWAN1 evaluation board and I have complete the porting for the STM32L072 mcu with Semtech SX1276. Those two are both included in CMWX1ZZABZ-091. For testing reasons I use two of B-L072Z-LRWAN1 boards. I also set the values SystemMaxRxError = 20 and MinRxSymbols = 5.

The gateway is an IMST iC880A-SPI wiht a RPI 2B running the loraserver from https://www.loraserver.io/.

The device is joining successfully to the network (DR_5) and transmits the data to the gateway and to loraserver. The node transmits packets for a while (the duration differs from time to time) but suddenly fails and gives the message below forever!!! [14:12:10:247] ###### ===== MCPS-Request ==== ######␍␊ [14:12:10:247] STATUS : Busy␍␊

This happens when when a RX1 window fails and the interrupt on DIO1 pin is raised, the SX1276OnDio1Irq callback is executed. Putting some debug printfs in specific places (like SX1276OnDio1Irq and LoRaMacMcpsRequest routins) I noticed that in this particular point of time the LoraMacState = LORAMAC_TX_RUNNING and the SX1276.Settings.State = RF_RX_RUNNING. This cause the system to get stuck in this state.

A normal behavior would wait be for the node in this situation, according to other builds (i-cube_lrwan) is to try start an ADR negotiation to find a better signal quality in different Data rate.

As @mluis1 commented is his last reply, I tested many different combinations MinRxSymbols and SystemMaxRxError in order to see if anything improved. But nothing happens.

I also tested with different two more cloud loraservers. The behavior is exactly the same.

I have this problem weeks now and I am trying to find a solution in order to go on. It would really be appreciated if someone could give any guidance to check or correct things.

If any detailed information needed from any level I can post it the soonest possible

Thank you in advance.

aplastiras commented 6 years ago

Hi mluis1 My LoRaMAC status LORAMAC_TX_RUNNING status has been resolved, but I can not process the data to RxDon because it reflects the DIO1 status (Rxtimeout) ....

@ericAtlantis Please let us know how you resolved the LORAMAC_TX_RUNNING issue

ly1243667342 commented 6 years ago

@

Hi mluis1 My LoRaMAC status LORAMAC_TX_RUNNING status has been resolved, but I can not process the data to RxDon because it reflects the DIO1 status (Rxtimeout) ....

@ericAtlantis Please let us know how you resolved the LORAMAC_TX_RUNNING issue

i alse meet this issue. this issue how to reproduced ?

ly1243667342 commented 6 years ago

@aplastiras
can you provide test way for me ? i have the same issue with you , but i meet this issue occasionally, so i can not analysis this issue quickly.

mluis1 commented 6 years ago

We don't know which modifications have been made to the original code. Thus, it is hard to provide a solution as we can't reproduce the issue.

In one of the previous messages it was said that printf function calls have been added to interrupt handlers. This can potentially break the normal operation. In plus it is not advised to do such calls in interrupt handlers. An interrupt handler should take as less time as possible to get executed.

The usage of printf function on the firmware must be carefully analyzed and executed when no critical operations are running. The printing on the UART may take a lot of time to be executed which could break the MAC layer operation. This is one of the reasons why the 921600 bits/s baudrate is used for the printf operations in the provided examples. Another way to improve the printf operations and lower the used datarate (i.e 115200 bits/s) would be to use the UART peripheral in DMA mode.

We have tested the ClassA example on a B-L072Z-LRWAN1 platform and it behaves as expected up until now. Since we started to run this example this morning more than 1800 frames have been sent. Please see below log.

###### ===== UPLINK FRAME 1889 ==== ######

CLASS       : A

TX PORT     : 2
TX DATA     : UNCONFIRMED
00

DATA RATE   : DR_5
U/L FREQ    : 867500000
TX POWER    : 0
CHANNEL MASK: 00FF

###### ===== MCPS-Request ==== ######
STATUS      : OK

###### ===== MCPS-Confirm ==== ######
STATUS      : OK

###### ===== UPLINK FRAME 1890 ==== ######

CLASS       : A

TX PORT     : 2
TX DATA     : UNCONFIRMED
00

DATA RATE   : DR_5
U/L FREQ    : 867300000
TX POWER    : 0
CHANNEL MASK: 00FF

###### ===== MCPS-Request ==== ######
STATUS      : OK

###### ===== MCPS-Confirm ==== ######
STATUS      : OK

###### ===== UPLINK FRAME 1891 ==== ######

CLASS       : A

TX PORT     : 2
TX DATA     : UNCONFIRMED
00

DATA RATE   : DR_5
U/L FREQ    : 868300000
TX POWER    : 0
CHANNEL MASK: 00FF

###### ===== MCPS-Indication ==== ######
STATUS      : OK

###### ===== DOWNLINK FRAME 468 ==== ######
RX WINDOW   : 1
RX PORT     : 1
RX DATA     :
01

DATA RATE   : DR_5
RX RSSI     : -55
RX SNR      : 7

###### ===== MCPS-Request ==== ######
STATUS      : OK

###### ===== MCPS-Confirm ==== ######
STATUS      : OK

###### ===== UPLINK FRAME 1892 ==== ######

CLASS       : A

TX PORT     : 2
TX DATA     : UNCONFIRMED
01

DATA RATE   : DR_5
U/L FREQ    : 867900000
TX POWER    : 0
CHANNEL MASK: 00FF

###### ===== MCPS-Request ==== ######
STATUS      : OK

###### ===== MCPS-Confirm ==== ######
STATUS      : OK

###### ===== UPLINK FRAME 1893 ==== ######

CLASS       : A

TX PORT     : 2
TX DATA     : UNCONFIRMED
01

DATA RATE   : DR_5
U/L FREQ    : 868100000
TX POWER    : 0
CHANNEL MASK: 00FF
ly1243667342 commented 6 years ago

@mluis1 @aplastiras

i think i have solve this issue . i modified two places. first, in the funciton ProcessRadioRxDone() remove below line. //if( MacCtx.AckTimeoutTimer.IsRunning == false) second,in the funciton LoRaMacProcess() add two lines.

// MAC proceeded a state and is ready to check
if( MacCtx.MacFlags.Bits.MacDone == 1 )
{
    // A error occurs during receiving
    if( ( MacCtx.MacState & LORAMAC_RX_ABORT ) == LORAMAC_RX_ABORT )
    {
        MacCtx.MacState &= ~LORAMAC_RX_ABORT;
        MacCtx.MacState &= ~LORAMAC_TX_RUNNING;
    }

    // An error occurs during transmitting
    if( IsRequestPending( ) > 0 )
    {
           ........
    }
    **else
        MacCtx.MacState &= ~LORAMAC_TX_RUNNING;**

i am not sure that will cause other issue. i have test 6 hours , all is going well. i hope you can give a warning for me, thanks .