arduino-libraries / MKRGSM

GNU Lesser General Public License v2.1
54 stars 51 forks source link

MKRGSM library not reliable #66

Open alanoatwork opened 5 years ago

alanoatwork commented 5 years ago

I've posted this before, but I'll try again. My code is pretty simple. I check for SMS messages and then reply back using Hologram's network. I'm using the latest MKRGSM library 1.3.1. I also have a 1400 mAhr lithium battery connected so I'm confident that I don't have a hardware issue related to modem current.

Here's my code:

#include <MKRGSM.h>
const char PINNUMBER[] = " ";
const char GPRS_APN[] = "hologram";
const char GPRS_LOGIN[] = " ";
const char GPRS_PASSWORD[] = " ";

String HOLOGRAM_DEVICE_KEY = "********";
String HOLOGRAM_TOPIC = "_SOCKETAPI_";

GSMClient client;
GPRS gprs;
GSM gsm(1);                                             // Enable debug
GSM_SMS sms;
GSMScanner scan;

char server[] = "cloudsocket.hologram.io";
int port = 9999;
boolean isSMSAvailable = false;
char sms_message[145];

void setup() {
  Serial.begin(115200);
  //while(!Serial);

  scan.begin();
  connectGSM();
}

void connectGSM() {
  boolean connected = false;

  while (!connected) {
    Serial.println("Begin GSM Access");

    if ((gsm.begin() == GSM_READY) &&
        (gprs.attachGPRS(GPRS_APN, GPRS_LOGIN, GPRS_PASSWORD) == GPRS_READY)) {
      connected = true;
      Serial.println("GSM Access Success");
      Serial.println(scan.getCurrentCarrier());
    } 
    else {
      Serial.println("Not connected");
      delay(1000);
    }
  }
}

void loop() {
  if(Serial.available()) {
    char c = Serial.read();
    if(c == 'e')
       MODEM.debug();
    if(c == 'd')
         MODEM.noDebug();
  }

  // Get any new incoming txt messages
  int c;
  if (sms.available()) {
    int i = 0;
    while ((c = sms.read()) != -1) {
      sms_message[i++] = (char)c;
    }
    sms_message[i] = '\0';        // Terminate message
    isSMSAvailable = true;
    sms.flush();
  }

  if(gsm.isAccessAlive()) {
    if(gprs.status() != GPRS_READY) {
      if(gprs.attachGPRS(GPRS_APN, GPRS_LOGIN, GPRS_PASSWORD) == GPRS_READY)
        Serial.println("GPRS ready!");
      else
        Serial.println("GRPS not ready!");
    }
  }
  else {
    Serial.println("Reconnect to GSM...");
    connectGSM();
  }

  // Send message back through hologram
  if(isSMSAvailable) {
    isSMSAvailable = false;

    if (client.connect(server, port)) {
      client.print("{\"k\":\"" + HOLOGRAM_DEVICE_KEY + "\",\"d\":\"");
      client.print(sms_message);
      client.println("\",\"t\":\""+HOLOGRAM_TOPIC+"\"}");
      client.stop();
    }
    else {
      MODEM.send("AT+USOER");
    }
  }

  delay(1000);
}`

It takes anywhere from a few days to a week or more to exhibit the problem. My logs typically look like this, where the incoming SMS message, "Jdjd", gets received and then repeated back to me via Hologram's network.

OK
AT+CMGL="REC UNREAD"

+CMGL: 19,"REC UNREAD","+19495472010",,"18/11/14,03:53:24+00"
Jdjd

OK
AT+CMGD=19

OK
AT+CREG?

+CREG: 0,5

OK
AT+USOCR=6

+USOCR: 0

OK
AT+USOCO=0,"cloudsocket.hologram.io",9999

OK
AT+USOWR=0,21,"7B226B223A22433E383375242B57222C2264223A22"

+USOWR: 0,21

OK
AT+USOWR=0,4,"4A646A64"

+USOWR: 0,4

OK
AT+USOWR=0,20,"222C2274223A225F534F434B45544150495F227D"

+USOWR: 0,20

OK
AT+USOWR=0,2,"0D0A"

+USOWR: 0,2

OK
AT+USOCL=0

OK
AT+CMGL="REC UNREAD"

OK
AT+CREG?

+CREG: 0,5`

However, after a week or so this happens:

`OK
AT+CMGL="REC UNREAD"

+CMGL: 19,"REC UNREAD","+19495472010",,"18/11/20,19:51:00+00"
JDJDJ

OK
AT+CMGD=19

OK
AT+CREG?

+CREG: 0,5

OK
AT+USOCR=6

+USOCR: 0

OK
AT+USOCO=0,"cloudsocket.hologram.io",9999

ERROR

+UUSOCL: 0
AT+USOCL=0

ERROR
AT+CMGL="REC UNREAD"

OK
AT+CREG?

+CREG: 0,5

OK
AT+CMGL="REC UNREAD"

OK
AT+CREG?

+CREG: 0,5

From then on, the library is never able to recover.

Any help would be greatly appreciated. The code I submitted is a stripped down version of my application. I've been wanting to release my application using the MKRGSM board but I've been haunted by this issue. I see there's some asynchronous options that aren't well documented and I'm reluctant to use this approach - besides, shouldn't this library be rock solid as documented!

Rocketct commented 5 years ago

Hi @alanoatwork, i'm work on this issue, could you test the same sketch by imposing a timeout adding in the connectGSM() gprs.setTimeout(180000); and gsm.setTimeout(180000) this should allow to come back to .loop function after 3 minutes and retry from samd waiting.

alanoatwork commented 5 years ago

I'll give this a try if that's what you're suggesting:

void connectGSM() {
  gprs.setTimeout(180000);
  gsm.setTimeout(180000);

  boolean connected = false;

  while (!connected) {
    Serial.println("Begin GSM Access");

    if ((gsm.begin() == GSM_READY) &&
        (gprs.attachGPRS(GPRS_APN, GPRS_LOGIN, GPRS_PASSWORD) == GPRS_READY)) {
      connected = true;
      Serial.println("GSM Access Success");
      Serial.println(scan.getCurrentCarrier());
    } 
    else {
      Serial.println("Not connected");
      delay(1000);
    }
  }
}
Rocketct commented 5 years ago

@alanoatwork yes exactly in this way.

alanoatwork commented 5 years ago

Two weeks and so far so good. Why aren't these timeouts set by default so perhaps others don't experience the difficulty that I faced?

alanoatwork commented 5 years ago

After weeks of testing, the library still doesn't work correctly. I went a couple of weeks with no issue, so I turned-off debug and let my application run. I went about a week before the board just stopped responding. So I enabled debug again and I'll let it run until it dies. This is super frustrating that the board isn't able to recover. I noticed that the lockups occur much sooner if I don't connect an external 1400 mAh lithium cell, but even with the cell connected lockups occur eventually. Does the async approach solve this issue? Can someone pipe in here and explain the advantages/disadvantages between these approaches and whether or not I should be exploring this option.

alanoatwork commented 5 years ago

It died within the first day. Here's my log...

OK
AT+CMGL="REC UNREAD"

OK
AT+CREG?

+CREG: 0,5

OK
AT+CMGL="REC UNREAD"

OK
AT+CREG?

+CREG: 0,5

OK

UPSDD
AT+CMGL="REC UNREAD"

OK
eiriksels commented 5 years ago

When you get the freeze, does the board recover with a reboot or removing and putting the power back on? If not, I guess it is the same issue as I see. I would be interested if you have found any solutions. My experience is that I can reproduce the issue quite fast when placing my unit in a car and driving around with varying network conditions.

alanoatwork commented 5 years ago

My setup always recovers with a reboot.

eiriksels commented 5 years ago

I am now running your sketch alanoatwork. Only difference I made was using timeout of 20 sec for the gsm/gprs. So far it is working good, but only tested for 2 days. I hope you get some feedback soon on the issues you are facing from the developers.

One suggestion as you mentioned that battery voltage does seem to influence the stability:

IF or a WHILE condition of analogRead(ADC_BATTERY)*(4.3/1023<3.7) where you do not let it upload or have communications with the modem below a certain battery voltage.

alanoatwork commented 5 years ago

I've been running my sketch while connected to the USB port AND with a battery - just to make sure that the issues weren't hardware related. I'll try a longer timeout and see if that helps.

Nels52 commented 5 years ago

@alanoatwork For what it is worth I have a sketch which uses the GSM, GPRS, and GSMClient state machines to periodically report to the hologram cloud. Whenever I experience any error such as the failed TCP socket connection to "cloudsocket.hologram.io",9999 referenced above my sketch resets everything from the beginning (gsm.begin() and gprs.attachGPRS()). The gsm.begin() invokes a Modem class method that toggles the resetPin. This isn't very elegant but my sketch has been running unattended for more than 2 months now.

alanoatwork commented 5 years ago

@Nels52 Thanks for reaching out. I'm not very familiar with GitHub and now see that it's pretty simple to reach out to another user! I've put my synchronous application on hold and just coded up my app based on your asynchronous HomeMonitor5 code. I was hoping that the switch from the Dash board to the MKR board would be relatively easy, but here I am 6 months later with an unstable product. Your framework is nicely written and fits into my code pretty easily, so thanks very much for sharing it with us. I'll report back if anyone is interested how this works out for me. I would be interested to know if you've made any changes since HomeMonitor5 as I don't profess to understand all the possible errors and corner cases that could arise when trying to communicate with Hologram.

Nels52 commented 5 years ago

@alanoatwork HomeMonitor5 is my latest version. In fact, I am wintering away from home and currently using it to monitor my home temperature.

I operate the GSM, GPRS, and GSMClient state machines in asynchronous mode so my sketch can control each step through those state machines and thereby avoid as many hang conditions as possible. I haven't used synchronous mode but it appears that the gsm and gprs timeouts were implemented as a way to prevent hangs in those state machines. @Rocketct referenced those in this thread.

One consideration that you may have to be aware of in your sketch that I didn't need to consider is the loss of data connection that prevents your device from receiving SMS messages. That may have happened in one of the previous logs in this thread that shows a UPSDD. From the u-Blox AT-command documentation that indicates the gprs data connection has been deactivated. When this happens the MRKGSM library sets the GPRS status to IDLE. A gprs.status() call will reveal this. If this happens you may want to do the complete GSM and GPRS reset.

alanoatwork commented 5 years ago

@Nels52 So in loop(), you're suggesting that I simply add:

if(gprs.status() == IDLE) {
  while(!startWebClient());
}
Nels52 commented 5 years ago

@alanoatwork It looks like you already have gsm and gprs status checks so you could just change the logic to:

if(gsm.isAccessAlive()) {
    if(gprs.status() != GPRS_READY) {
        while(!startWebClient());
    }
} else {
    while(!startWebClient());
}
alanoatwork commented 5 years ago

@Nels52 I removed the code from my loop() after following your example but I get the idea. Thanks for the tip, otherwise I'm sure I would have had another system freeze! It's too bad that synchronous use of the library is unreliable. Unless the library developers get serious and try to address the variuos shortcomings, I don't think many folks are going to be very happy with the library's reliability.

eiriksels commented 5 years ago

@alanoatwork I did observe the hang after 2 days of running the sketch. So now I'm also trying out @Nels52 HomeMonitor v5. It took me some time to figure out how to adapt to my own application, but I think it should be ok now. I really hope I will be able to have a stable board now. I'll update you on how it goes. As I usually observe the issues I get when I move the unit around (placed in a car) I think that is the "worst" challenge for this hardware and library. But as all cell phones are able to do the same it should definitely be possible :)

alanoatwork commented 5 years ago

@eiriksels I'm taking my app out for a test drive for the next few days as well so we'll see how it does moving around. My only issue is that I can't collect debug logs while on the road - I haven't figured that out yet.

eiriksels commented 5 years ago

@Nels52 Here is the sketch I have for making GET requests to Blynk, which is copied from your HomeMonitor V5. I would be very happy if you could see if I deleted any critical code for the communication. My sketch uploads temperature and voltage data to blynk, switching between temperature and voltage for each time a request is carried out.

/*
  HomeMonitor

  Circuit:

  MKR GSM 1400 board
  Antenna
  SIM card with a data plan
  Temperature sensor
  created 10-26-2018
  by Bob Nelson

*/

// libraries
#include <MKRGSM.h>
#include <OneWire.h>
#include <DallasTemperature.h>

#define ONE_WIRE_BUS 5

OneWire oneWire(ONE_WIRE_BUS);
DallasTemperature sensors(&oneWire);
const char PINNUMBER[] = "";
const char GPRS_APN[] = "netcom";
const char GPRS_LOGIN[] = "";
const char GPRS_PASSWORD[] = "";

// URL, path and port (for example: example.org)
char server[] = "blynk-cloud.com";
String path = "/AUTHKEY/update/";
int port = 8080; // port 80 is the default for HTTP

unsigned int switcher = 1;

/// Test multiple connections within one run
int connectCount = 0;

// initialize the library instances
GSMClient client(false);
GPRS gprs;
GSM gsmAccess(true); // Turn debug on so we see all AT commands and responses

bool clientConnected = false; // Connection state flag
bool restartClient = true; // Client restarted flag - initially set to ensure temperature is reported at startup

// Script defines
#define TEMPERATURE_READ_INTERNAL_10_MINUTES 60000 // 10 minute interval in milliseconds

#define BAUD_RATE_115200 115200 // Serial ports baud rate

unsigned long lastTemperatureReadTime = 0; // Last time temperature was read
int samplesSinceLastReport = 0; // Number of samples since last report to Hologram server - initialize to 0

bool reportTemperature = false; // Report Temperature flag

// Execute once at startup
void setup()
{
  int index;

  // Make sure we have a temperature to report for startup
  calculateTemperature();

  // Initialize serial communications
  Serial.begin(BAUD_RATE_115200);

  /// Remove the following while loop when testing is complete.
  /// while (!Serial)
  /// {
  /// ; // wait for serial port to connect. Needed for native USB port only
  /// }

  // Initialize web client
  do
  {
  } while (!startWebClient());
}

void loop()
{
  unsigned long runTime;
  bool connectStatus;

  // Check the client connection status. It should be 0 (not connected, i.e. socket = -1). However, there could be a residual
  // socket value if the GSM and GPRS have been restarted due to a AT+USOCO=... command hang. Issue a client.stop() to close
  // the socket and reset the _socket variable to -1 in the GSMClient. The AT+USOCL= command issued to the modem will end in
  // ERRROR because the socket is no longer valid due to the modem restart.
  //
  // This check also forces a call to the MODEM poll() method which removes URCs from the UART interface
  // between the SAMD microcontroller and the ublox SARA U201 modem. It seems like a good idea to remove URCs
  // in a timely manner, although it didn't reduce hang conditions on the SARA U201 interface.
  connectStatus = client.connected();
  if (connectStatus)
  {
    Serial.print("Unexpected client connect status: ");
    Serial.println(connectStatus);
    client.stop();
  }

  // Run Time is the number milliseconds since the MKR board began running the current program instance.
  runTime = millis();

  // Protect against run time wrap which occurs after approximately 50 days.
  if (runTime < lastTemperatureReadTime)
  {
    lastTemperatureReadTime = runTime;
  }

  // Calculate the temperature if our read interval has been reached
  if ((runTime - lastTemperatureReadTime) >= TEMPERATURE_READ_INTERNAL_10_MINUTES)
  {
    // We should not be connected.
    if (!clientConnected)
    {
      Serial.println("connecting...");

      // If able to connect to Hologram server.
      if (connectClient())
      {
        Serial.println("connected");
        clientConnected = true;
        connectCount++;

        // Send message to Hologram iot server
        sendMessageToBlynk();
        lastTemperatureReadTime = millis();

        // If the Hologram web server fails to respond and/or close restart the client.
        if (!checkForResponseFromHologram())
        {
          // Restart Web client
          Serial.println("Response/Close NOT received");
          do
          {
            Serial.println("Restarting Web client");
          } while (!startWebClient());
        }
      } else {
        // Restart Web client
        Serial.println("Connection failed");
        do
        {
          Serial.println("Restarting Web client");
        } while (!startWebClient());
      }
    } else {
      Serial.println("Client unexpectedly still connected");
    }
  }
}

// Calculate a temperature to send to Hologram Web Server.
// Return true if temperature should be reported and false if it doesn't need to be reported.
bool calculateTemperature() {
  return true;
}

// Initialize GSM and GPRS
bool startWebClient()
{
  int gsmBeginStatus;
  int gprsAttachStatus;
  int startWebClientInitializationCount = 0;
  int gsmReadyStatus = 0;

  Serial.println("startWebClient HomeMonitorV5 Build 11-12-2018 Rev 1");

  // Initialize the GSM with a modem restart and asynchronous operation mode.
  // I modified the MODEM instance in the MKRGSM 1400 library to initialize with a baud rate of 115200.
  gsmBeginStatus = gsmAccess.begin(PINNUMBER, true, false);
  if (gsmBeginStatus == 0)
  {
    // Modem has been restarted. Delay for 2 seconds and loop while GSM component initializes and registers with network
    delay(2000);

    // March thru the GSM state machine one AT command at a time using the ready() method.
    // This allows us to detect a hang condition on the SARA U201 UART interface
    do
    {
      gsmReadyStatus = gsmAccess.ready();
      startWebClientInitializationCount++;
      delay(100);
    } while ((gsmReadyStatus == 0) && (startWebClientInitializationCount < 600));

    // If the GSM registered with the network attach to the GPRS network with the APN, login and password
    if (gsmReadyStatus == 1)
    {
      Serial.print("GSM registered successfully after ");
      Serial.print(startWebClientInitializationCount * 100);
      Serial.println(" ms");

      // Perform an asynchronous attach to the GPRS network.  That way we can prevent a GPRS hang in the MKRGSM1400 library
      gprs.attachGPRS(GPRS_APN, GPRS_LOGIN, GPRS_PASSWORD, false);
      do
      {
        delay(100);
        startWebClientInitializationCount++;
        gprsAttachStatus = gprs.ready();
      } while ((gprsAttachStatus == 0) && (startWebClientInitializationCount < 600));

      if (gprsAttachStatus == 1)
      {
        gprsAttachStatus = gprs.status();
        if (gprsAttachStatus == GPRS_READY)
        {
          Serial.println("Attached to APN");
          restartClient = true;
          return true;
        } else {
          Serial.print("GPRS status: ");
          Serial.println(gprsAttachStatus);
          return false;
        }
      } else if (gprsAttachStatus == 0) {
        Serial.println();
        Serial.print("GPRS Attach timed OUT after ");
        Serial.print(startWebClientInitializationCount * 100);
        Serial.println(" ms");
        return false;
      } else {
        // Print gprsAttachStatus as ASCII encoded hex because occasionally we get garbage return characters when the network is in turmoil.
        // It appears the _ready variable in the MODEM Class instance is being overwritten with garbage.
        Serial.print("GPRS Attach status: ");
        Serial.println(gprsAttachStatus, HEX);
        return false;
      }
    } else if (gsmReadyStatus == 0) {
      Serial.print("GSM Ready status timed OUT after ");
      Serial.print(startWebClientInitializationCount * 100);
      Serial.println(" ms");
      return false;
    } else {
      // Print gsmReadyStatus as ASCII encoded hex because occasionally we get garbage return characters when the network is in turmoil.
      Serial.print("GSM Ready status: ");
      Serial.println(gsmReadyStatus, HEX);
      return false;
    }
  } else {
    Serial.print("GSM Begin status: ");
    Serial.println(gsmBeginStatus);
    return false;
  }
}

// Connect to the APN
bool connectClient()
{
  int connectStatus;
  int loopCount;

  connectStatus = client.connect(server, port);
  if (connectStatus == 0)
  {
    // The GSMClient has an AT command outstanding to the MODEM. This is an unexpected condition.
    // Return false to cause a MODEM restart and GSM and GPRS initialization.
    Serial.println("GSMClient unexpected AT command outstanding");
    client.stop(); // reset the GSMClient state and close the socket if necessary
    return false;
  }

  // Stay in a while loop and wait for connection to complete or 60 second timeout
  loopCount = 0;
  while (loopCount < 600)
  {
    connectStatus = client.ready();

    // If command in progress delay 100 ms, increment count and check again
    if (connectStatus == 0) {
      loopCount++;
      delay(100);
    } else if (connectStatus == 1) {
      // AT Command has completed. Return codes:
      //                           1 - success
      //                           2 - error
      //                           3 - no carrier
      Serial.print("Connect after ");
      Serial.print(loopCount * 100);
      Serial.println(" ms");
      return true;
    } else {
      Serial.print("Connect failed: ");
      Serial.println(connectStatus);
      client.stop();        // reset the GSMClient state and close the socket if necessary
      return false;
    }
  }

  // Connection timed out.
  client.stop(); // reset the GSMClient state and close the socket if necessary
  Serial.println();
  Serial.print("Connection Timed OUT loopCount = ");
  Serial.println(loopCount);
  return false;
}

// Send the appropriate message to Hologram based on temperature and whether or not the client has been restarted.
void sendMessageToBlynk()
{
  Serial.println("Making GET request");
  sensors.requestTemperatures();
  sensors.getTempCByIndex(0);
  float h =  sensors.getTempCByIndex(0);
  h = h * 10;
  h = round(h);
  float t = h;
  t = t / 10;

  String temppath = "V2?value=";
  String V2value = String(t);

  String input = path + temppath + V2value;

  Serial.println("temperatur er");
  Serial.println(t);

  String voltagepath = "V3?value=";

  int sensorValue = analogRead(ADC_BATTERY);
  float voltage = sensorValue * (4.3 / 1023);
  int percent = ((voltage - 3.6) / 0.6) * 100;

  String V3value = String(voltage);
  String inputvoltage = path + voltagepath + V3value;

  Serial.println(input);
  Serial.println(inputvoltage);

  if ((switcher % 2) == 0) {
    input = inputvoltage;
  }
  else {
  }

  Serial.println("connected");
  // Make a HTTP request:
  client.print("GET ");
  client.print(input);
  client.println(" HTTP/1.1");
  client.print("Host: ");
  client.println(server);
  client.println("Connection: close");
  client.println();
  client.stop();
  switcher = switcher + 1;

  // Clear restart and report temperature flags and update last temperature report time
  restartClient = false;
  reportTemperature = false;
  samplesSinceLastReport = 0;
}

//
// After returning the response code the Hologram server will close the connection
//
// Return true if response and/or close received from Hologram server.
// Otherwise, return false so the modem and client can be restarted.
bool checkForResponseFromHologram()
{
  int loopCount = 0;
  int availableBytesToRead;
  boolean bNextCharIsResponseCode = false;

  // Use a do while loop to restrict the amount of time we wait for a response and/or close from the Hologram server
  do
  {
    // The GSMClient connected() method implementation in the MKRGSM 1.3.0 library performs a AT+USORD=0,512 to fill its read buffer.
    // If the AT+USORD=0,512 returns an ERROR response the connected() method calls the stop() method to close the socket
    if (!client.connected())
    {
      Serial.println();
      Serial.println("disconnecting.");
      clientConnected = false;
      return true;
    }

    // The GSMClient available() method implementation in the MKRGSM 1.3.0 library will also invoke the AT+USORD=0,512 command if
    // the GSM socket buffer is empty, i.e. the AT+USORD=0,512 command issued by the previous invocation of the GSMClient connected() method
    // returned 0 bytes.  Therefore, delay 50 ms so we don't bombard the modem with 2 AT+USORD=0,512 commands in rapid fire.
    delay(50);
    availableBytesToRead = client.available();

    // If there are bytes from the server in the socket buffer read them and print them.
    // Stay in the loop until all bytes in the socket buffer have been processed.
    while (availableBytesToRead)
    {
      char c = client.read();
      Serial.print(c);
      if (bNextCharIsResponseCode)
      {
        if ((c >= '0') && (c <= '8'))
        {
          Serial.println("Response");
        } else
        {
          Serial.print("Unrecognized Response: ");
          Serial.println(c);
        }

        bNextCharIsResponseCode = false;
      } else if (c == '[')
      {
        bNextCharIsResponseCode = true;
      }

      availableBytesToRead--;
    }

    loopCount++;
    delay(50);        // Delay 50 ms before the next AT+USORD=0,512 command issued by the GSMClient connected() method
  } while (loopCount < 600);

  // Did not receive response and/or close from Blynk server. Return false so modem and client can be restarted.
  Serial.println();
  Serial.println("No response/close from Blynk.");
  client.stop();
  clientConnected = false;
  return false;
}
Nels52 commented 5 years ago

@eiriksels I have looked at your script and it looks fine with regards to the startWebClient() and connectClient() methods and the way the script uses them. I can't comment on the sendMessageToBlynk() and checkForResponseToHolgram() methods in your script because they are specific to the Blynk server interface. I am using the Data Engine interface on the Hologram server which returns a response code within brackets [] and then closes the TCP socket connection.

Several small notes: (1) The calculateTemperature() method is not needed beecause your script does its calculations in the sendMessageToBlynk() method. (2) The TEMPERATURE_READ_INTERNAL_10_MINUTES define is actually 1 minute (60000 milliseconds). I don't know if you want to send a message every minute or every 10 minutes.

eiriksels commented 5 years ago

@Nels52 Thank you. I do the 1 minute interval to increase the possibility of error if the script does not work, although I should have specified this in the definition.

I think that Blynk does not give any response code when I use a GET request, so the response code part is not really used in my sketch. I could update the sketch to be a PUT request instead and then get a feedback from the Blynk server.

I have now had the script running since yesterday, but I've had issues as soon as I'm out driving. I have no debug log due to the fact that it only happens when I'm driving. On the positive side, the Board is not really frozen, and after x amount of time it could start broadcasting again. I might take it for a drive with my laptop on the side to see if I can get a debug log of it happening.

Nels52 commented 5 years ago

@eiriksels If the Blynk server does not close the TCP socket connection after the GET request is completed the checkForResponseFromHologram() will return false resulting to a call to startWebClient() in the main loop. If that is the case your script will always restart the MKRGSM 1400 uBlox modem each time a GET request is sent. This doesn't cause any harm but it may be unnecessary. The code I am referring to in checkForResponseFromHologram() is:

    // The GSMClient connected() method implementation in the MKRGSM 1.3.0 library performs a AT+USORD=0,512 to fill its read buffer.
    // If the AT+USORD=0,512 returns an ERROR response the connected() method calls the stop() method to close the socket
    if (!client.connected())
    {
      Serial.println();
      Serial.println("disconnecting.");
      clientConnected = false;
      return true;
    }
eiriksels commented 5 years ago

@Nels52 Thanks for the info. The startWebClient() does not start for each iteration. I think it is because I have a client.stop(); after I've sent the data? As showed below here:

Serial.println("connected");
// Make a HTTP request:
client.print("GET ");
client.print(input);
client.println(" HTTP/1.1");
client.print("Host: ");
client.println(server);
client.println("Connection: close");
client.println();
client.stop();

I did also put in a check of the GSM Connection for each iteration before the connectClient() is checked. Based on my experience it is good to check if GSM Connection is still active before trying to Connect to the server with an if(gsmAccess.isAccessAlive()).

// Calculate the temperature if our read interval has been reached
if ((runTime - lastTemperatureReadTime) >= TEMPERATURE_READ_INTERNAL_1_MINUTES)
{
  **if(gsmAccess.isAccessAlive())**
  {

// We should not be connected.
if (!clientConnected)
{
Serial.println("connecting...");

  // If able to connect to Hologram server. 
  if (connectClient())

………………..

In the "else" of that check I put in a:

do
{
    Serial.println("Not connected to GSM/GPRS");
  }while(!startWebClient());
  delay(1000);
}

So far, so good with the code now. I've been driving around and have not seen hangs for a day now.

Nels52 commented 5 years ago

@eiriksels Thanks for pointing out the client.stop() near the end of the sendMessageToBlynk(). Because you aren't expecting a response from Blynk that makes sense.

Good Luck!

alanoatwork commented 5 years ago

@Nels52 I noticed that every hour i detect gprs.status() == IDLE and restart the web client. However, this consumes about 1000 bytes, meaning that I'd use about 720K per month doing nothing. Any ideas on how to avoid the cost of running startWebClient() every hour?

Nels52 commented 5 years ago

@alanoatwork I did not realize that 1000 bytes were consumed for each startWebClient(). My script only invokes the startWebClient() when I am unable to connect to Hologram. I typically see that happen 2 to 4 times a day.

If I understand your script you must remain connected to the gprs network all the time in order to receive an SMS message at any time. From your previous post it looks like that connection times out after about an hour. The only suggestion I have is to periodically send a short keep alive message to Hologram (every 50 minutes perhaps) to see if that maintains the gprs connection.

alanoatwork commented 5 years ago

@Nels52 I'm able to receive SMS messages even when grps.status() == IDLE. I guess this means that a GPRS connection isn't required to receive SMS. I'll let it run without the startWebClient() on IDLE and see what happens.

Nels52 commented 5 years ago

@alanoatwork That is very interesting. I haven't tested receiving SMS messages with my MKRGSM 1400 and Hologram SIM card. Are you using the Hologram web server to send SMS messages to your MKRGSM 1400?

From what I understand of SMS one can use either a circuit switched connection or gprs packet network to receive SMS messages. If it is circuit switched connection one needs a phone number and I am guessing you don't have a phone number assigned to your SIM card.

alanoatwork commented 5 years ago

@Nels52 Basically my app is a platform to economically control an IoT device an inexpensively as possible. The MKR board directly receives SMS messages, I suppose via GSM directly. Received SMS messages do not pass through Hologram and do not incur any reception fees from Hologram. Only the SMS sender pays and since I'm on an unlimited plan, there's no additional cost. As required the MKR board communicates back to the sender via SMS but in my case SMS messages are relayed through Hologram's SMS relay service. I believe the MKR board could send SMS messages directly, but the cost is $0.19 per message. However, if you use Hologram's SMS relay service, you can send one SMS for about 1000 bytes. Since for about $1.40/month you get 1 MB anyway, that's 1000 SMS messages! I do have a phone number assigned to each SIM and use it to communicate with my system only via SMS.

Nels52 commented 5 years ago

@alanoatwork Now I see what you are doing. The only time you need a gprs network attachment is when your script sends data to Hologram in response to an incoming SMS message.

If you are only sending an SMS message to the MKR board a few times a day it probably isn't a big deal if startWebClient() is invoked each time a data response is returned. However, if you are frequently sending SMS messages to your MKR board each day you may want a more persistent attachment to the gprs network. I see that Issue #30 has some discussion about how to do this.

eiriksels commented 5 years ago

@Nels52 @alanoatwork Hi. So now I have been able to get a log of the type of freeze I get when I am driving around. It seems that the AT + CREG? request for some reason times out at approx 4 Seconds when the issue occurs. Then nothing happens until the timeout of 60 Seconds of startWebClient().

I do not understand why this freeze happens. If you do, please advise me. I am thinking that the modem is in a faulty mode and that a more actual "reset" is needed to get the Board back to operation again. Pressing the reset button makes everything come back again.

I have Attached the error log where you can see what happens. I have also Attached a log of the normal connection where AT+CREG? continues until connection is achieved. In addition I have added my sketch. I hope that I could get some advise on what I could do With my code to make it come out of this non-communicating state. I am thinking of introducing something such as the below to reset the GSM module if not communication is achieved:

pinMode(GSM_DTR, OUTPUT);
digitalWrite(GSM_DTR, LOW);
delay(5);

// Turn on the GSM module by triggering GSM_RESETN pin
pinMode(GSM_RESETN, OUTPUT);
digitalWrite(GSM_RESETN, HIGH);
delay(100);
digitalWrite(GSM_RESETN, LOW);

delay(1000);

Error log when not working.txt Complete Sketch eiriksels 04022019.txt Normal connection log.txt

What do you think? Might be better to trigger the actual reset pin of the board?

Nels52 commented 5 years ago

@eiriksels That is mysterious how after about 4 seconds the AT+CREG? appears to hang. The only thing you might want to try is to reduce the UART interface speed. Some people on other threads (see Issue #27) have reduced the speed and think it helps make the interface more stable. Based on the discussion in that thread I reduced the speed to 115200. I haven't had the issue you are having so it might be worth a try. In order to do that you will have to modify the MODEM Class instance in the Modem.cpp file and rebuild the library. Change:

ModemClass MODEM(SerialGSM, 921600, GSM_RESETN, GSM_DTR);

to:

ModemClass MODEM(SerialGSM, 115200, GSM_RESETN, GSM_DTR);

You can verify if you have done it correctly by looking in your trace. You should see AT+IPR=115200 instead of AT+IPR=921600.

By the way, when startWebClient() invokes the gsmAccess.begin(PINNUMBER, true, false) method MODEM.begin(true) is called which toggles the GSM_DTR and GSM_RESETN pins as you described in your previous post.

alanoatwork commented 5 years ago

@eiriksels Just wondering how you are powering the MKR board while in transit? If your power supply isn't clean perhaps you may have some noise introduced from the car? Can you change the supply you are using just to try a different setup? Do you have a LiPo battery connected?

alanoatwork commented 5 years ago

@eiriksels I took my board out for a test drive around town. I was powered by a fully-charged LiPo 1400 mAH cell and drove about 15 miles. I was connected to the same network for the entire trip, I didn't detect any faults. I'm still running the modem baudrate at the default 921600.

eiriksels commented 5 years ago

@alanoatwork Hello. I am using a LiPo 2000 mAh battery conencted to the designated input on the board. Usually I have it connected to USB so that it is being charged every time I drive, but I've tried different setups. The battery voltage has been above 3.8 v every time it has happened. It only happens occationally as I drive. Might also be differences in the areas that we live in. Thanks though for attempting it.

@Nels52 Thanks for the info regarding the baudrate and that the GSM module is already being reset in the MODEM class. I will try it if my current sketch is not stable. I have now introduced a hard reset whenever the GSM timeout happens (first a 4 minute wait so that it does not reset all the time whenever not in GSM coverage). I've done this by pulling a wire from digital outlet to RESET pin. Not elegant, but I hope it will not stop communicating, which is 1 priority. If you see downsides with this approach, please let me know.

Nels52 commented 5 years ago

@alanoatwork @eiriksels It appears that there are several mysterious hang issues out there and I am wondering if there is a problem in the UART interface. I am not a UART expert but here is the scenario I am wondering about:

  1. The MODEM send method eventually invokes the Uart::write(data) method to send an AT command. The Uart::write(data) method is called within a while loop to send the entire command over the UART interface.
  2. If Uart::write(data) is called with nothing in the txbuffer and the DRE empty flag is set the data is immediatly written to the data register which clears the DRE empty flag (as I understand the UART interface).
  3. Let's say that Uart::write(data) is called with the next byte to be written and the DRE empty flag is still cleared. This results in the execution of the else section which checks for txBuffer full and then puts the data into the txBuffer.
  4. Let's say that while performing the processing in step 3 that the DRE empty flag is set because the character written in step 1 has been accepted by the ublox modem.
  5. After the processing in step 3 has been completed the DRE Interrupt is enabled. However, it appears to me that this is too late because the DRE empty flag has already been set in step 4.
  6. If this scenario is valid the UART interface will be hung and the remaining send characters will be put in the txBuffer and the Uart::IrqHandler() will never be called because the DRE Interrupt was enabled too late in step 5. This scenario would explain Issue #70 where the processor is hung waiting for the txBuffer to drain. It may also explain how reducing the UART interface speeds may provide some relief.

Could @sandeepmistry or @Rocketct please comment on the validity of this scenario?

eiriksels commented 5 years ago

@Nels52 Yes, I am just about giving up on the whole board. I still hope that there is a "coding" way of making this stable, but I have not found it, and have probably now tried 100 different variants of sketches.

B-Clever commented 5 years ago

@Nels52 I follow the discussions here on Github for a long time. We bought four MKR GSM 1400 boards and stopped developing with these devices a few months ago. It was not possible to get a reliable permanent connection to the server. Especially situations with many cell tower handovers and varying signal strengths (while driving in vehicles) are problematic. We assume there is an underlying hardware design fault. Not regarding the U201 modem, but the connection of the modem to the board. Hangs occur most of the time if there is a handover from 3G to 2G, or during 2G cell handover. Maybe the current draw peaks in case of switching from 3G to 2G are to high for the board (33 dBm in 850/900 MHz with up to 1.9 Amps according u201 datasheet). Anyway,.. we didn´t get the board stable and stopped any further time wasting. But we will still watch the ongoing discussion here in case there is a "real breakthrough" in realiablity.

alanoatwork commented 5 years ago

@Nels52 @eiriksels @B-Clever I'll take my board out for a longer test ride, across the SF bay area later today and report back. I've been struggling with reliability also for months and only just found what seems like a reliable solution after incorporating @nels52's async code. Although I went mobile for a short time, I was always connected to the same cell provider, so I'll expand my trial and report back.

Nels52 commented 5 years ago

@eiriksels I have looked at your debug traces and looked at the library code and don't believe that the potential UART interface issue I described above applies to the hang captured by your trace. If you were to encounter the issue I described above your sketch/library would be stuck in the Uart::flush() method waiting forever for the txBuffer to drain just like Issue #70 . If that happens the sketch would hang forever and startWebClient() would not be invoked which contradicts your error trace log which shows the GSM timeout followed by a call to startWebClient(). I still think the UART issue I described above may be real but it just doesn't appear to apply to your hang.

If you are still interested you could add debug messages to verify if the sketch/library is sending the next AT+CREG? command. You could put the following debug message in the GSM state machine:

   case READY_STATE_CHECK_REGISTRATION: {
      MODEM.setResponseDataStorage(&_response);
      MODEM.send("AT+CREG?");
      Serial.println("Sending AT+CREG?");
      _readyState = READY_STATE_WAIT_CHECK_REGISTRATION_RESPONSE;
      ready = 0;
      break;
    }

NOTE: I am assuming that you are still hanging on the AT+CREG? command from the GSM state machine as show in your previous error trace log.

If you get the hang and see that the AT+CREG? command is sent but it is not echoed back with the response from the SARA U201 then the problem is either the command is not being sent cleanly (as postulated by @B-Clever) OR the SARA U201 is hanging on the command for some reason. Although I don't know what to do in either case at least you would have a little more info about what is going on.

alanoatwork commented 5 years ago

Well, I went out last night and was able to lock up my system. Unfortunately, I didn't have logging enabled so I'm planning to try again later this week with logging enabled as well as dropping the UART speed to 115200. Will report back soon.

alanoatwork commented 5 years ago

Here's an update... I dropped the baudrate down to 115200, enabled debug and logged everything. I recreated the path that I took previously and saw that I was connected to T-Mobil for most of the duration and occasionally lost reception. My signal strength varied from about 3 to 27 on a scale of 0 to 31. All told I drove about a 30 mile trip and didn't have any unrecoverable errors. At home I have very poor signal strength, generally around 3 -5 so I expect that my current peaks are at or close to the maximum that can be expected with the board. I've got a short USB cable and LiPo 1.4 Ah cell connected.

I'm curious if @eiriksels exhibited his issues after he dropped the baudrate? I'm also curious if @B-Clever used the synchronous or asynchronous implementation of the library?

eiriksels commented 5 years ago

Hi @alanoatwork . I have been on work missions last week, so havent tested fully yet. But I do have some question regarding getting this right:

@Nels52 It is about getting this correct in the ModemClass:

Change: ModemClass MODEM(SerialGSM, 921600, GSM_RESETN, GSM_DTR); to: ModemClass MODEM(SerialGSM, 115200, GSM_RESETN, GSM_DTR); You can verify if you have done it correctly by looking in your trace. You should see AT+IPR=115200 instead of AT+IPR=921600.

When I changed this, I can see no AT+IPR commands in my log, so I am wondering if something went wrong in my adaption or compilation? Could you advise me, or attached the modified file to this thread and I will test. I drive around a lot, so it will be a good test.

If I do not get any good result now, I am thinking of having a separate MKR unit coupled to a mosfet to actually disconnect the battery power to my MKRGSM whenever it senses a hung condition.

Just FYI I will also test the MKR NB1500 board these days. I have received some custom SIM for that.

Nels52 commented 5 years ago

@eiriksels I take back what I said about seeing AT+IPR=115200. If you set the UART baud rate to 115200 in the ModemClass instance you will NOT see an AT+IPR= command in the debug log when gsmAccess.begin() is invoked by startWebClient().

Here is what happens. gsmAccess.begin() calls ModemClass::begin(). At the beginning of this method is the following line:

 _uart->begin(_baud > 115200 ? 115200 : _baud);

If the ModemClass instance specifies a baud rate of 115200 OR less this line of code will set the UART interface baud rate to the rate specified in the ModemClass instance.

Further down in the ModemClass::begin() method is the following code:

  if (_baud > 115200) {
    sendf("AT+IPR=%ld", _baud);
    if (waitForResponse() != 1) {
      return 0;
    }

Because the baud rate has been set to 115200 the code in this if statement will NOT be executed and the AT+IPR= command will not appear in the debug log. AT+IPR= will only appear in the debug log if a baud rate greater than 115200 is specified. So to make a long story short it looks like you have properly set the UART baud rate to 115200.

Unfortunately it sounds like you are still experiencing 'hangs'. The debug log you posted showing the error indicates that your sketch and the MKRGSM library are not hung because the the GSM initialization in startWebClient() does time out and is restarted. The problem is that after about 20 AT+CREG? commands are issued by the GSM state machine something either happens to the UART interface that garbles the AT+CREG? command or its response, or the SARA modem fails to respond to the last AT+CREG? command. Other people in this and other threads have tried workarounds by lowering the UART baud rate or making sure they had power sources that could deliver a peak of 2 amps. Since you have already lowered the UART baud rate you may want to verify that your power source can deliver the peak 2 amps if you haven't already done so.

eiriksels commented 5 years ago

@Nels52 Thanks for your explanation. I then assume I managed to drop the baud rate. I have seen 1 hang but I need to test more to draw a conclusion. I was able to drive all day without hangs today,but will continue tomorrow.

My power supply is a 2000mAh Lipo that is connected to the designated battery connector. Sometime when I drive I have it also connected to a 5v Micro Usb to charge the battery while driving. But I have tested both with and without this connected.

Nels52 commented 5 years ago

@eiriksels You have definitely dropped the baud rate to 115200 and the 2000mA Lipo definitely meets the specifications published for the MKR GSM 1400.

If you get another hang like the one you documented in the previous error log where the AT+CREG? commands stopped but the loop timer would always expire after about 1 minute it may be worthwhile to drive to a location with a known good signal to see you can connect to the network and recover without power cycling the MKR GSM 1400.

eiriksels commented 5 years ago

@Nels52 Thanks. I have now implemented the Adafruit sleepydog Watchdog library to see if I can go completely hang free. I think that approach should be in a different thread than this one, as it is not really fixing the issues with the library/hardware. I will create a thread after doing some testing with that.

Just a diagnostics feature tip: to detect hangs when not having the serial monitor available, use the onboard led (LED_BUILTIN) and have it blinking sometime during the sketch. You will then see if the whole board is actually frozen, or if it is just a communication issue.

eiriksels commented 5 years ago

Hi. I can confirm that I still experience hangs with the reduced baud rate. I can also inform that the Watchdog system I implemented was not enough to get the board back alive. It is still the same hang that AT+CREG ? stops after 4-5 sec.

To get the board back it was not enough to even push a single time on the "reset". I had to double click the "reset". It then goes to bootloader mode? And when I then hit reset again it started working as normal. Again, this happened while I was out driving.

Nels52 commented 5 years ago

@eiriksels It sounds like you are hitting the types of issues that @B-Clever referenced earlier in this thread. I have not hit these issues probably due to the fact that my application is not mobile. If you are interested here is what I would do:

  1. Go back to pre-Watchdog version of your sketch which doesn't seem like a hardship since it didn't have the desired result anyway.
  2. Add the Serial.println("Sending AT+CREG?"); in the GSM.cpp state machine that I referenced earlier in this thread.
  3. Reproduce the problem which seems like an easy thing to do and see if the sketch is sending the next AT+CREG? command and hanging waiting for the response from the SARA U201 or if the next AT+CREG? command is not being sent. If you see the "Sending AT+CREG?" debug message in the log and the next entry you see is the "GSM Ready status timeout" message in the debug log then we know that the last AT+CREG? command was accepted by the SARA U201 but no response was received.
  4. Finally, rather than reset the MKRGSM 1400 when you get a hang, drive to a location with known good reception and see if your sketch recovers without doing a power reset. If you have already done this just ignore this step.
eiriksels commented 5 years ago

@Nels52 I will try this if my current attempt does not work out.

I reported the issue to the Arduino team as a technical issue. The feedback I got was to shorten the PTC unit by soldering a wire, or remove the whole piece and solder a wire where it was. Had it running for 2 days while driving around so far with the wire in place, but too soon to make any conclusions.

I attach the picture of the points that are now connected with a wire (the round dots).

I received information that this unit acts like a fuse and allows too little current to pass through. This is why it should be shortened.

unnamed