TheThingsNetwork / arduino-device-lib

Arduino Library for TTN Devices
MIT License
206 stars 96 forks source link

Implement error handling/timeout on readLine from LoRa module #242

Open savnik opened 5 years ago

savnik commented 5 years ago

I sometimes experience that the LoRa module(RN2483) does not send anything back after a sendBytes. The current implementation will wait forever until a return signal is send. This will result in deadlock unless a WDT is externally used. On many MCU's the WDT will have less than 8s count which is not always enough for a message to send.

https://github.com/TheThingsNetwork/arduino-device-lib/blob/23c933f7c3434fb8fc32a46855e8376ce271feee/src/TheThingsNetwork.cpp#L360-L369

I propose that a timeout is implemented to handle if there is no response from RN2483 within reasonable time.

johanstokking commented 5 years ago

Yeah I am aware of this issue. It's not just a matter of adding the timeout, that's why it was never implemented, but I agree that a deadlock is not nice either. The problem is that if you return early, the library doesn't know the state of the RN module anymore. If the next command would work, the first response may be some previous response that's in a buffer, instead of the expected response. Also, the RN module should return a response all the time – otherwise I think there's an issue with the RN module.

In any case, it makes sense to put a timeout on the serial. If 0 is returned on readBytesUntil(), it means a timeout, according to the always scarce Arduino documentation. But then, the application needs to be informed, and it should result in resetting the RN module via its reset pin.

savnik commented 5 years ago

Yeah I see your point about returning too early. RN module as you say should work 100% of the time, however sometimes we see it locks up, not knowing why or what triggered it. I would agree that fixing the root cause would be the best solution, but unfortunately we cant access the raw firmware of the RN module to figure out what goes wrong. So then we though about how to handle this error.

You may argue that if the response time is over a certain value the RN module is likely in a faulty state.

If the function is returning a zero, as you suggest, it will give the application the possibility to handle the error which is an enhancement to now. Have you been working on such a solution already?

savnik commented 5 years ago

I see two approaches to handle the timeout.

  1. Using milis() to measure the time spend in the while loop.

    size_t TheThingsNetwork::readLine(char *buffer, size_t size, uint8_t timeout = 10000) // Default timeout vallue TBD
    {
    size_t read = 0;
    uint8_t start_time = milis();
    while (read == 0 && milis()-start_time<timeout)
    {
    read = modemStream->readBytesUntil('\n', buffer, size);
    }
    if(milis()-start_time<timeout)){ // If timeout is activated return 0 to inform the application
    return 0;
    }
    else{
    buffer[read - 1] = '\0'; // set \r to \0
    return read;
    }
    }
  2. Using modemStream timeout. This assumes that somewhere we set modemStream->setTimeout(x). Then if x = 1000, then we will achive approximately 10s timeout as in option 1.

    size_t TheThingsNetwork::readLine(char *buffer, size_t size, uint8_t attempts = 10) // Default timeout attempts TBD
    {
    size_t read = 0;
    while (read == 0 && attempts>0)
    {
    read = modemStream->readBytesUntil('\n', buffer, size);
    attempts--;
    }
    if(attempts<=0){ // If timeout attempts is activated return 0 to inform the application
    return 0;
    }
    else{
    buffer[read - 1] = '\0'; // set \r to \0
    return read;
    }
    }
johanstokking commented 5 years ago

I think we have to use a timeout, as readBytesUntil() should simply not return otherwise. To my understanding, the only reason for it to return 0 is on timeout.

The timeout could be set in the application sketch by the user, so that's why it's expecting and looping over 0 response now. I agree this is not nice.

Instead, I think we need to have some invalid state marker in the class, to inform the application that the RN module needs to be reset.

Can you work on a PR?

savnik commented 5 years ago

I have created a PR. Let me know your thoughts about it.