gasagna / A76XX

Arduino library for the A76XX family of SIMCOM cellular modules, with native MQTT(S), HTTP(S), etc, clients!
MIT License
6 stars 1 forks source link

Modem Performance degrades over time, trying to figure out why! #3

Open Moskus opened 7 months ago

Moskus commented 7 months ago

I've had two of these boards running since the beginning of november. They are supposed to upload data every other second (30 times per minute). And in the beginning, this was working fine: No problems at all with upload times being 900-1100 ms. We can live with that.

But now we are experiencing lots of dropouts and high upload times. Switching the SIM-card from an "old" board into a new board (running the same code) gives stable uploads and low upload times as expected.

What could cause the boards to deteriorate like this? Is there something we can do to avoid it?

I was wondering if this could be a result for degrading flash memory . Looking into your examples like SaveCertificates.ino, it seems that the certificate needs to be written as a file somewhere (on the ESP32? or the modem?). How often does this happen?

In the ToDo-list it seems that the overwriting should not occur anymore, but I have a hard time figuring out if this is the case. The certOverwrite function overwrites no matter what (but perhaps that is intended):

    // delete certificate if exists, then download
    int8_t certOverwrite(const char* cert, const char* certname) {
        int8_t retcode;

        if (certExists(certname)) {
            retcode = certDelete(certname);
            A76XX_RETCODE_ASSERT_RETURN(retcode);
        }

        retcode = certDownload(cert, certname);
        A76XX_RETCODE_ASSERT_RETURN(retcode);

        return retcode;
    }

Our code only connects to one server, and (hopefully) using the same certificate over and over again.

This library is still very impressive, and it's massive. I might be overlooking something but what?

gasagna commented 7 months ago

The certificate gets written in the flash memory of the simcom module whenever you call* A76XXSecureClient::writeCaCert. This memory is non-volatile, so in principle you could run a sketch once that store the certificate and then you upload your actual sketch that only uses A76XXSecureClient::setCaCert. In this case, you would never touch the flash memory. But honestly, I do not know what causes your problem. Have you tried turning on the debug output and see where is gets stuck?

*Note that the http and mqtt clients inherit from A76XXSecureClient.

Moskus commented 7 months ago

Thanks for getting back to me! I'm grasping at straws here, trying to figure this out.

Our code is just a tweaked version of the HTTPpost example. We're doing the http_client.post() in a Task that loops every 2 seconds. Another task is collecting samples using ESP-NOW, and the modem task reads them every 2 seconds, creates a json-string, and performs a HTTP Post after that.

Apparently, on a brand new modem this works brilliantly with the exact same code.

On an older it gets stuck and times out. Alot.

Like here:

2023-12-04 09:18:17.797     --> Executing post request... 
2023-12-04 09:18:17.802     AT+HTTPPARA="URL","https://ourdomain.org:443/api/"
2023-12-04 09:18:17.898     __
2023-12-04 09:18:17.904     OK
2023-12-04 09:18:17.911     AT+HTTPPARA="USERDATA","User-Agent:A7608E-test/0.0.1"
2023-12-04 09:18:17.915     
2023-12-04 09:18:17.921     OK
2023-12-04 09:18:17.926     AT+HTTPDATA=1542,30
2023-12-04 09:18:17.928     
2023-12-04 09:18:17.930     ERROR
2023-12-04 09:18:17.931     G 11
2023-12-04 09:18:17.932     ERROR, code: -2 Generic

G 11 is printed over the last A76XX_GENERIC_ERROR in http.h -> inputData. (There's so many generic error references, that we needed to number them).

We can also get:

2023-12-04 09:18:19.787     --> Executing post request... 
2023-12-04 09:18:19.790     AT+HTTPPARA="URL","https://ourdomain.org:443/api/"
2023-12-04 09:18:19.791     
2023-12-04 09:18:19.792     DOWNLOAD
2023-12-04 09:18:47.747     ERROR
2023-12-04 09:18:47.748     ERROR, code: -2 Generic

... and:

2023-12-04 09:18:49.769     --> Executing post request... 
2023-12-04 09:18:49.770     AT+HTTPPARA="URL","https://ourdomain.org:443/api/"
2023-12-04 09:18:49.771     
2023-12-04 09:18:49.774     OK
2023-12-04 09:18:49.775     AT+HTTPPARA="USERDATA","User-Agent:A7608E-test/0.0.1"
2023-12-04 09:18:49.776     
2023-12-04 09:18:49.777     ERROR
2023-12-04 09:18:49.778     AT+HTTPDATA=1623,30
2023-12-04 09:18:49.784     
2023-12-04 09:18:49.922     DOWNLOAD{"mm_apikey":"our_key","data":{"station_id":123456789,"station_unixtime":1701677928,"values":[1200-1600 BYTES OF JSON]]}}
2023-12-04 09:18:49.924     
2023-12-04 09:18:49.924     OK
2023-12-04 09:18:49.926     AT+HTTPACTION=1
2023-12-04 09:18:49.928     
2023-12-04 09:18:49.929     ERROR
2023-12-04 09:18:49.930     G 3
2023-12-04 09:18:49.931     ERROR, code: -2 Generic

"G 3" is printed over the last A76XX_GENERIC_ERROR in http.h -> inputData.

Other things to notice. The number after "Total Send Time" is generally higher on a malfunctioning device, approx 400-700 ms higher.

Here's a graph showing problem. It's the number values it has uploaded from that modem for each hour since December 1st. The value should ideally be 3600 for each hour (one every second), disregard the first and last values, the hours are incomplete. image

I'm not sure why this is happening. I tried erasing the flash on the ESP32, but that didn't help (that helped on a regular ESP32 when I earlier had used ESP-NOW, and was reverting back to normal wifi for a test). I'm trying to figure out how to factory reset the modem, I think the command is AT&F.

See logfile: Logfile.txt

gasagna commented 6 months ago

Hi @Moskus, thanks for the additional details. I am a bit lost here, tbh. It seems to me that the problem might be the SIMCOM module, or how my library uses it. The "ERROR" message is thrown when calling several commands, so i think the modem might get stuck somewhere and then it starts failing repeatedly.

Some comments to help debugging:

Moskus commented 6 months ago

However: I've had some discussions with the carrier and one problem (among many, peharps) might be that the connection sometime drops down to 2G, which is using GPRS speeds. Is there a way to force the connection to be LTE/4G only?

I'm looking through the code, but can't find it, but I might not be looking hard enough...