WaiveCar / WaiveTelem

Repository for files related to the waivetelem project
0 stars 0 forks source link

Loses Cellular Connection with No Restoration #73

Open jlandau10 opened 4 years ago

jlandau10 commented 4 years ago

Saw 548EE was not connected to AWS. Physically connected USB serial out to monitor. It was trying to reconnect but not showing any AT traffic. List of actions taken below.

`AWS Shadow:  **Last update: Dec 13, 2019 4:05:15 PM -0800** 

{2019-12-17T22:56:16Z WARN src\\Cellular.cpp:120 connect , _ : Failed to connect, try later }
{2019-12-17T22:57:16Z INFO src\\Cellular.cpp:98 connect , apn : internet.swir }
{2019-12-17T22:57:16Z WARN src\\Cellular.cpp:120 connect , _ : Failed to connect, try later }
(repeating at once/minute)

**no AT logged, scary.  Trying a reboot.**

**AT Restored:** 
*
*
*
*

AT

OK
{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:30 checkModemFirmwareUpdate , AT+UFOTACONF=2 :  }
AT+UFOTACONF=2

+UFOTACONF: 2, -1

OK
AT+UFOTACONF=2,-1

OK
*
*
*

successful connection`
jlandau10 commented 4 years ago

hmmm.... looks like we only got a couple shadow updates and it's again not connected.

Current time: 3:48, per AWS: Last update: Dec 17, 2019 3:15:05 PM -0800 going to check it out again

jlandau10 commented 4 years ago

Same thing, tries to connect, fails retries. Trying to add the response to the modem try later warning.

logWarn("Failed to connect, try later. Response: ",modemResponse);

now need to wait for it to disconnect again to see if it works.

if we can detect 'no response' we should try resetting the modem.

interesting. without the power cycle we never restored AT. First output after flash:

{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response:  : �I }
{2019-12-18T00:00:50Z INFO src\\Cellular.cpp:98 connect , apn : internet.swir }
{2019-12-18T00:00:50Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response:  : �I }
{2019-12-18T00:00:51Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ heartbeat :{ datetime : 2019-12-18T00:00:51Z , lat :34.0852503, long :-118.3377335, hdop :680, speed :0, heading :0, uptime :29, temp :-1, freeMem :11079, lastVin :1248}}}}}
{2019-12-18T00:01:50Z INFO src\\Cellular.cpp:98 connect , apn : internet.swir }
{2019-12-18T00:01:50Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response:  : HR }
{2019-12-18T00:01:45Z INFO src\\Cellular.cpp:98 connect , apn : internet.swir }
{2019-12-18T00:01:45Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response:  : HR }
{2019-12-18T00:02:39Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ canbus :{ door_front_left :1}}}}}

Need to check our power pins. We set pwr_on to high at the beginning and then chill with it there. I recall that pin really just being like a push button toggle. If so need to make a function to toggle the modem on/off. on no response - try a power toggle. on the second no response -try another power toggle. this should work to redundantly solve problems. other option - the modem.reset() function is an AT command. what about the modem reset pin? would that work?

Consistently getting the HR response indicating my no response:

{2019-12-18T00:18:47Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response:  : HR }
{2019-12-18T00:18:52Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ heartbeat :{ datetime : 2019-12-18T00:18:52Z , lat :34.0852643, long :-118.337726, hdop :670, speed :0, heading :0, uptime :1110, temp :-1, freeMem :10843, lastVin :1502}}}}}
{2019-12-18T00:19:31Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ heartbeat :{ datetime : 2019-12-18T00:19:31Z , lat :34.0852593, long :-118.3377343, hdop :670, speed :0, heading :0, uptime :1149, temp :-1, freeMem :10843, lastVin :1447}}}}}
{2019-12-18T00:19:47Z INFO src\\Cellular.cpp:98 connect , apn : internet.swir }
{2019-12-18T00:19:47Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response:  : HR }
{2019-12-18T00:19:51Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ heartbeat :{ datetime : 2019-12-18T00:19:51Z , lat :34.0852533, long :-118.3377357, hdop :660, speed :0, heading :0, uptime :1169, temp :-1, freeMem :10843, lastVin :1493}}}}}
{2019-12-18T00:20:31Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ heartbeat :{ datetime : 2019-12-18T00:20:31Z , lat :34.0852643, long :-118.3377362, hdop :660, speed :0, heading :0, uptime :1209, temp :-1, freeMem :10843, lastVin :1500}}}}}
{2019-12-18T00:20:47Z INFO src\\Cellular.cpp:98 connect , apn : internet.swir }

going inside with internet access to research

jlandau10 commented 4 years ago

PWR_ON Pin per datasheet: image This says that it should be kept high which we are doing. I remember reading somewhere that it was a pulse toggle instead. I could have been mistaken or we could have conflicting documentation. Maybe it changed with the firmware update?

RESET_N Pin: image But how long to reset?

Power cycle and reset: image Does the reset pin not reset the module? that seems weird.

jlandau10 commented 4 years ago

created two functions:

void modemHardwareReset(){
    // reset the ublox module
  digitalWrite(SARA_RESETN, HIGH);
  delay(100);
  digitalWrite(SARA_RESETN, LOW);
}

void modemPowerToggle(){
  digitalWrite(SARA_PWR_ON, LOW);
  delay(180); // power on toggle time is .15 seconds, pwr off toggle time is 1.5 seconds. trying .18s first for theory that modem is off or asleep. 
  digitalWrite(SARA_PWR_ON, HIGH);
}

added conditional trigger:

 if (modemResponse.indexOf("+CEREG: 0,2") != -1) {
      needExtraConnectTime = true;
    } else if (modemResponse.indexOf("HR")>-1){
      logWarn("got that HR response, want me to do something?");
      //modemHardwareReset();
    }

checked serial output before flashing and response is now "PS", fun: {2019-12-18T00:50:02Z WARN src\\Cellular.cpp:120 connect , Failed to connect, try later. Response: : PS }

trying the flash anyways.

and a different illegible response: {2019-12-18T00:53:49Z WARN src\\Cellular.cpp:136 connect , Failed to connect, try later. Response: : �I } eventually got an HR but it's probably not "HR" for the purposes of string matching.

what if i say if you don't find "AT+":

  } else if (modemResponse.indexOf("AT+")==-1){
      logWarn("didn't get an AT response, want me to do something?");
      //modemHardwareReset();
    }
{1970-01-01T00:00:00Z INFO src\\Cellular.cpp:111 connect , apn : internet.swir }
{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:132 connect , _ : didn't get an AT response, want me to do something? }
{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:136 connect , Failed to connect, try later. Response:  : �I }

okay detection works, how about the reset? uncommenting.

{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:132 connect , _ : didn't get an AT response, trying the hardware reset }
{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:136 connect , Failed to connect, try later. Response:  : �I }
{2019-12-18T01:02:44Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ canbus :{ ignition :3}}}}}
{2019-12-18T01:02:44Z INFO src\\Cellular.cpp:111 connect , apn : internet.swir }
{2019-12-18T01:02:44Z WARN src\\Cellular.cpp:132 connect , _ : didn't get an AT response, trying the hardware reset }
{2019-12-18T01:02:44Z WARN src\\Cellular.cpp:136 connect , Failed to connect, try later. Response:  : �I }

no luck, trying the power toggle.

{1970-01-01T00:00:00Z INFO src\\Cellular.cpp:111 connect , apn : internet.swir }
{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:132 connect , _ : didn't get an AT response, trying the pwr toggle }
{1970-01-01T00:00:00Z WARN src\\Cellular.cpp:136 connect , Failed to connect, try later. Response:  : �I }
{1970-01-01T00:00:28Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ canbus :{ ignition :3}}}}}
{2019-12-18T01:04:56Z INFO src\\System.cpp:248 report , message :{ state :{ reported :{ heartbeat :{ datetime : 2019-12-18T01:04:56Z , lat :34.0852337, long :-118.3377333, hdop :650, speed :0, heading :0, uptime :29, temp :-1, freeMem :11079, lastVin :1495}}}}}
{2019-12-18T01:04:56Z INFO src\\Cellular.cpp:111 connect , apn : internet.swir }
{2019-12-18T01:04:56Z WARN src\\Cellular.cpp:132 connect , _ : didn't get an AT response, trying the pwr toggle }
{2019-12-18T01:04:56Z WARN src\\Cellular.cpp:136 connect , Failed to connect, try later. Response:  : HR }

still no luck. going to try bumping up the time for the hardware toggle to 1.55s delay(1550)

nope

2 seconds?

still nope

let's try a different type of power cycle reset low for 10 seconds, then power on.

started with just the reset low for 10.5 seconds and it worked!!!!

void modemHardwareReset(){
    // reset the ublox module
  digitalWrite(SARA_RESETN, HIGH);
  delay(10500);
  digitalWrite(SARA_RESETN, LOW);
}
    } else if (modemResponse.indexOf("AT+") == -1){
      logWarn("didn't get an AT response, trying modem 10sec reset");
      modemHardwareReset();
    }

i highly highly doubt this will fix the consistently no-AT board (234EE) but going to try. - Narrator: "it did not"

jlandau10 commented 4 years ago

don't always get AT+ sometimes just +.

jlandau10 commented 4 years ago

trying this:

   } else if (modemResponse.indexOf("OK") == -1 and modemResponse.indexOf("+")==-1){
      logWarn("didn't get an AT response, trying modem 10sec reset");
      modemHardwareReset();
    }

testing on known good now, monitoring for triggering the reset unintentionally.

jlandau10 commented 4 years ago

697EE had disconnected from AWS on 12/18 (i didnt notice) it did not have this code. reflashed - instantly triggered the reset and then successfully connected.

WDT tripped on connection to AWS:

{2019-12-20T18:41:09Z INFO src\\Mqtt.cpp:101 connect , broker : a2ink9r2yi1ntl-ats.iot.us-east-2.amazonaws.com }
shutdown() ret: 1
ce32
cf28
105d0
cf0a
1074e

which is issue #74

jlandau10 commented 4 years ago

back to this issue - monitoring serial output on 697EE for unintentional modem trigger.

made it through a full 15min cycle to get 2 shadow updates removing serial monitoring for the next 15 minutes to try to get the other device working better