arduino-libraries / MKRGSM

GNU Lesser General Public License v2.1
54 stars 50 forks source link

Infinite loop receiving data #10

Open FrancMunoz opened 6 years ago

FrancMunoz commented 6 years ago

Hi!

I found an issue related to an infinite loop reading data. The problem is caused by too much AT commands send to GSM modem without waiting 20ms as is specified in docs:

AT Command examples Page 10 Figure 1

The problem appears after calling repeatedly (in my case each minute) your client example. After some tries the board falls into an infinte loop caused by MODEM.ready() that ever returns 0 so GSMClient::available() never ends:

int GSMClient::available()
{
  if (_synch) {
    while (ready() == 0) {};
  } else if (ready() == 0) {
    return 0;
  }

...

Debugging the cliend I found that the las command received from the modem is:

+UUSORD

And found this blog entries:

Missing URC +UUSORD AT+USORD : \r\n between information text response and result code

So following the instructions on AT command examples, only adding a 20ms delay after all AT commands will sort the issue. So in Modem.cpp:

void ModemClass::send(const char* command)
{
  if (_lowPowerMode) {
    digitalWrite(_dtrPin, LOW);
  }

  **delay(20);**
  _uart->println(command);
  _uart->flush();
  _atCommandState = AT_COMMAND_IDLE;
  //Serial.print("R0, ");
  _ready = 0;
}

This solved the infinite loop. Finally I opted to read more data from modem instead of readin char by char.

I would like to contribute with an example using buffers instead of readng char by char, could I? How do i proceed?

I can confirm after a whole night the MKR hasn't failed anymore.

Hope this helps.

Julien8631 commented 6 years ago

Hello,

It seems i have the same infinite loop. I put the 20ms inModem.cpp. I read the response of Thingspeak sever to verify my data is received. My MKR is running since 1 hour.

I'm not able to change char by char to buffer. Could you give me the issue for the buffer ?

Thank you.

FrancMunoz commented 6 years ago

Hi @Julien8631 !

The buffer is not an issue it's simply reading optimization of available data. Here you have an example:

String datos="";
uint8_t buffer[256];
int avail;
int recibido=0;

avail=cliente.read(&buffer[0], 255); // -1 for final NULL String and use concat.
    while(avail>0 && recibido<1024) { 
        recibido+=avail; // limit received data to 1024 bytes this avoids wasting time or undesired locks
        buffer[avail]=0; // NULL terminates buffer to convert to string and concat.
        datos.concat(String((char *)buffer));
        avail=cliente.read(&buffer[0], 255); // Next loop.
}

This is pretty faster than reading char by char.

I would like to provide a full example but nobody replied my request... I guess they are as busy as we are.

Hope it helps!

akash73 commented 6 years ago

hi @FrancMunoz and @Julien8631 , we are looking at the issue and seeing your tips. we are testing now, and let you know. thanks

sandeepmistry commented 6 years ago

Hi @FrancMunoz and @Julien8631,

Is this issue related to https://github.com/arduino-libraries/MKRGSM/issues/11

Do you have a battery connected to the board?

In any case, if you had a sketch (preferably minimal) that demonstrates how to reproduce the issue. This would really help us speed up the process of investigating this and fixing the the library.

FrancMunoz commented 6 years ago

Hi! I've reproduced with and without battery (i've MRK powered through VIN with enough current not limited as USB) but when I detected the problem I plug a battery (as I've read) but this failed.

So, let me prepare a litle sketch to reproduce the problem and will post it here.

Thank you!

Julien8631 commented 6 years ago

Hello,

Sunday, my board worked perfectly all the day with messages each 60 seconds.

When I pluged a 1000mAh LiPo battery, it doesn't work. I understood i need a 1500 mAh or higher LiPo battery. I also detected the problem without battery. Saturday, The board worked 1 hour and the bug appeared.

If you want, i will also reproduce the bug and give you the sketch.

FrancMunoz commented 6 years ago

Hi @sandeepmistry and @Julien8631 , Sorry I didn't answered all your questions. This issue is not related to #11 , but MKR dead at same point: while loop with MODEM.ready(). That is the result but not the cause.

For further information please read page 11 of AT Command Examples from ublox (this document is realated to SARA-U2 series).

The page 11 is about best practices (pasted from there):

The DTE shall wait some time (the recommended value is at least 20 ms) after the reception of an AT command final response or URC before issuing a new AT command to give the module the opportunity to transmit the buffered URCs. Otherwise the collision of the URCs with the subsequent AT command is still possible.

I think that is causing the problem and why the proposed 20ms delay works.

My project is stable, there's no hangs nor errors in three days of continous working. All in all i'm preparing the example.

sandeepmistry commented 6 years ago

If you want, i will also reproduce the bug and give you the sketch.

@Julien8631 absolutely, the more ways we have to reproduce the issue, the better. Then we can also confirm any fixes once ready.

sandeepmistry commented 6 years ago

Ping, any sketches available to reproduce this condition?

FrancMunoz commented 6 years ago

Hi @sandeepmistry!

I will try to write a sketch to reproduce the issue. All in all if you get the actual example and loop it every minute you will find the issue we have. Let me find more time...

Thank you for your patience and your effort!

Julien8631 commented 6 years ago

Hi,

You will find the log of first bug and the sketch. I continue to find other bug.

Bug1.txt

GSM_read_battery_voltage_Thingspeak_AREF.txt

sandeepmistry commented 6 years ago

@FrancMunoz thanks for the feedback on pull request #14. I'll close this issue since from your testing #14 fixes it and the pull request has been merged.

FrancMunoz commented 6 years ago

Hi! It stills hanging in the same loop. It's true that it happens less and the MKR is working at least 24 hours before it hangs but usually after 24/36 hours of work it gets locked.

I'm still believing that it's because the libraries should have to wait 20ms between the last reception, as I suggested above in this thread.

I've made some testings and modified ModemClass using a more elegant solution than just adding 20ms delay:

void ModemClass::send(const char* command)
{
  if (_lowPowerMode) {
    digitalWrite(_dtrPin, LOW);
  }

  unsigned long dif=millis()-_uartMillis;
  if(dif<20) delay(20-dif);

  _uart->println(command);
  _uart->flush();
  _atCommandState = AT_COMMAND_IDLE;
  _ready = 0;
}

and

void ModemClass::poll()
{
  while (_uart->available()) {
    char c = _uart->read();
    _uartMillis=millis();
...

With this changes the delays it's only done when needed and with the needed ms to fit docs. And again It doesn't hang MKR (And watchdog doesn't have to work).

Using 1.6.18 and 1.1.2.

What do you think?

sandeepmistry commented 6 years ago

Do you have debug logs for this?

The main concern about have the delay is byte by byte socket reads become super slow.

FrancMunoz commented 6 years ago

Hi! It dies at the same point as I described in first comment GSMClient::available. About logs I don't have it. I'm sorry.

I know the main problem reading byte by byte is that becomes super slow. That's what I proposed to change the examples to avoid collapse SARA with too much AT commands and be more efficient reading data.

In my first approach I put a timeout to while loop. then changed to read more data instead read char by char. Maybe putting a timeout could be more reliable reading byte by byte and only when it fails would be a delay.

Will give you more information when I have.

Thank you!

lathoub commented 6 years ago

I have applied your fix as described above (using _uartMillis) and the MKRGSM seem to remain alive (albeit much much slower)

FrancMunoz commented 6 years ago

@lathoub , just use a buffered reading and it's faster: Above in this thread there's an example.

FrancMunoz commented 6 years ago

@sandeepmistry, it seems that bug it's not fixed, could you please re-open the issue? Thank you.

lathoub commented 6 years ago

+1 to reopen this issue @FrancMunoz @sandeepmistry This is an issue that will affect all TCP/UDP traffic

lathoub commented 6 years ago

Update: MKRGSM still freezes after a while, albeit a bit later than before - sigh.Please reopen this issue as this is a major show stopper

sandeepmistry commented 5 years ago

@FrancMunoz could you please open a pull request for what you suggested in https://github.com/arduino-libraries/MKRGSM/issues/10#issuecomment-372209661 along with the changes in #44 adding a 20ms for URC to come in should be better for byte by byte reads.

If you don't have time, let us know and we can open the pull request for this.

FrancMunoz commented 5 years ago

Sure, done in #49 , I've reviewed #44 so If I didn't forgot something all it has no conflicts.

sandeepmistry commented 5 years ago

@FrancMunoz now that #44 and #51 are merged do you think this can be closed or are there some other changes needed?