espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
181 stars 126 forks source link

How should modem disconnects/reconnects be handled in unstable connections? (IDFGH-11995) #498

Closed mbastida123 closed 6 months ago

mbastida123 commented 8 months ago

Answers checklist.

General issue report

We currently have ~1000 devices deployed across Spain with a SIM7080G modem.

In general lines they work well, even the OTA.

However, in general we cannot guarantee a 100% success when it comes to updating via OTA or simply connecting to the internet.

The devices connect once every 24h, and sometimes that connection fails and so it becomes a connection every 48 or even 72h.

Up until now we were doing the following on IP_EVENT_PPP_LOST_IP:

} else if (event_id == IP_EVENT_PPP_LOST_IP) {
        ESP_LOGW(TAG, "Modem Disconnect from PPP Server");
        ESP_ERROR_CHECK(esp_event_post(GENERAL_EVENTS, GENERAL_EVENTS_INTERNET_CONNECTION_RETRY, NULL, NULL, portMAX_DELAY));
        if (connection_retries < MAX_CONNECTION_RETRIES)
        {         
            ESP_LOGD(TAG, "Retrying internet connection");
            ESP_ERROR_CHECK(esp_event_post(GENERAL_EVENTS, GENERAL_EVENTS_KICK_WDT_SERVICE_TIMEOUT, NULL, NULL, portMAX_DELAY));
            uint8_t task_create_retries=0;
            while(xTaskCreate(modem_restart, "modem restart", 4096, NULL, 10, NULL) != pdPASS && task_create_retries < 4){
                ESP_LOGD(TAG, "Could not create task, retrying");
                task_create_retries++;
            }                   
            connection_retries++;
        } 
        else{
            ESP_LOGD(TAG, "Exceeded internet connection retries. Abandoning");
            connection_retries=0;
            ESP_ERROR_CHECK(esp_event_post(GENERAL_EVENTS, GENERAL_EVENTS_INTERNET_CONNECTION_UNSUCCEEDED, NULL, NULL, portMAX_DELAY));
        }

So basically every time the connection got lost we would call modem_restart() which would basically initialize the modem again via esp_modem_get_signal_quality() and esp_modem_set_mode(dce, ESP_MODEM_MODE_DATA);

This doesn't work. And so what ends up working is straight reseting the ESP via esp_restart() and reseting the module by powering it down and powering it up again. With this procedure the modem has a higher success of obtaining an IP. But the problem is that sometimes it takes 10-12min for that connection to be achieved. This is unacceptable mainly from a power usage standpoint.

So the question is, what should be the correct procedure for forcing the SIMCOM modem to try to reconnect?

david-cermak commented 8 months ago

Hi,

the general suggestion would be to follow the reconnection procedure shown in this example:

https://github.com/espressif/esp-protocols/blob/5ba7cfab8eac01ccf8ae12d8b3a4de65f0be7171/examples/esp_netif/multiple_netifs/main/ppp_connect_esp_modem.c#L61-L93

As for unstable environments, it's hard to give a recommendation, other than to wait a bit with some back-off time and try again later?

This doesn't work.

Could you please elaborate? The modem library doesn't correctly enter the data mode, or is stuck somewhere? If so, this could be a real issue, because the modem library must correctly restart all operations and work seamlessly without hard resets! we've had a few similar reports, always turned out that esp_modem could deinit and re-init properly. Also, there could be problems with the device itself, some modems just need the hang-up command (as shown in the example above) to close the half-opened connection? There should be a way for any modem device to reconnect without a hard reset, too, but this is probably very much dependent on the real device itself.

Other ideas to approach the issue is to basically listen to all PPP state change events and react whenever the connection breaks/starts to terminate (should come sooner than IP_EVENT_PPP_LOST_IP) or If there's a need to restart the device, the ESP32 could be in deep sleep. Also, the PPP protocol might not be a great choice when performing retries and checking the state, so you can also try if the network is available using AT commands (if so, you'd just switch to PPP without issues, if not you'd simply backoff and go to sleep) similarly as it's done in this example:

https://github.com/espressif/esp-protocols/blob/5ba7cfab8eac01ccf8ae12d8b3a4de65f0be7171/components/esp_modem/examples/modem_tcp_client/main/sock_commands_sim7600.cpp#L174-L183

mbastida123 commented 8 months ago

Hi,

the general suggestion would be to follow the reconnection procedure shown in this example:

https://github.com/espressif/esp-protocols/blob/5ba7cfab8eac01ccf8ae12d8b3a4de65f0be7171/examples/esp_netif/multiple_netifs/main/ppp_connect_esp_modem.c#L61-L93

As for unstable environments, it's hard to give a recommendation, other than to wait a bit with some back-off time and try again later?

This doesn't work.

Could you please elaborate? The modem library doesn't correctly enter the data mode, or is stuck somewhere? If so, this could be a real issue, because the modem library must correctly restart all operations and work seamlessly without hard resets! we've had a few similar reports, always turned out that esp_modem could deinit and re-init properly. Also, there could be problems with the device itself, some modems just need the hang-up command (as shown in the example above) to close the half-opened connection? There should be a way for any modem device to reconnect without a hard reset, too, but this is probably very much dependent on the real device itself.

Other ideas to approach the issue is to basically listen to all PPP state change events and react whenever the connection breaks/starts to terminate (should come sooner than IP_EVENT_PPP_LOST_IP) or If there's a need to restart the device, the ESP32 could be in deep sleep. Also, the PPP protocol might not be a great choice when performing retries and checking the state, so you can also try if the network is available using AT commands (if so, you'd just switch to PPP without issues, if not you'd simply backoff and go to sleep) similarly as it's done in this example:

https://github.com/espressif/esp-protocols/blob/5ba7cfab8eac01ccf8ae12d8b3a4de65f0be7171/components/esp_modem/examples/modem_tcp_client/main/sock_commands_sim7600.cpp#L174-L183

Hi and thanks for the detalied response.

When I was saying that it didn't work I meant that the modem wasn't accepting the reconnect commands. Turns out I just needed to switch back to command mode...

However, with your guidelines I have come to a better solution but know I have other problems.

The connect code is as follows:

    err = esp_modem_sync(dce); 
    if (err != ESP_OK) { 
        ESP_LOGI(TAG, "Switching to command mode"); 
        esp_modem_set_mode(dce, ESP_MODEM_MODE_COMMAND); 
        ESP_LOGI(TAG, "Retry sync 3 times"); 
        for (int i = 0; i < 3; ++i) { 
            err = esp_modem_sync(dce); 
            if (err == ESP_OK) { 
                break; 
            } 
            vTaskDelay(pdMS_TO_TICKS(1000)); 
        } 
    }

    ESP_LOGI(TAG, "Manual hang-up before reconnecting");
    err = esp_modem_at(dce, "ATH", NULL, 2000);
    if (err != ESP_OK) {
        ESP_LOGW(TAG, "Manual hang-up before reconnecting fail"); 
    }

    while (true)
    {
        err = esp_modem_get_signal_quality(dce, &rssi, &ber);
        ESP_ERROR_CHECK(esp_event_post(GENERAL_EVENTS, GENERAL_EVENTS_KICK_WDT_SERVICE_TIMEOUT, NULL, NULL, portMAX_DELAY));
        if (err != ESP_OK) {
            //This probably happens because the modem hasn't had the time to boot from sleep.
            ESP_LOGW(TAG, "esp_modem_get_signal_quality failed with %d %s", err, esp_err_to_name(err));
            vTaskDelay(pdMS_TO_TICKS(500));
        }
        else
        {     
            ESP_LOGI(TAG, "Signal quality: rssi=%d, ber=%d", rssi, ber);
        }

        if (retries > 360){
            ESP_LOGE(TAG, "Cannot get valid rssi from modem, aborting");
            modem_stop(TURN_OFF_LTE_MODEM);
            ESP_ERROR_CHECK(esp_event_post(GENERAL_EVENTS, GENERAL_EVENTS_INTERNET_CONNECTION_UNSUCCEEDED, NULL, NULL, portMAX_DELAY));
            vTaskDelete(0);
        }
        retries++;

    }

    while (true)
    {
        err = esp_modem_set_mode(dce, ESP_MODEM_MODE_DATA);
        if (err != ESP_OK) {
            ESP_LOGW(TAG, "esp_modem_set_mode(ESP_MODEM_MODE_DATA) failed with %d", err);
            vTaskDelay(pdMS_TO_TICKS(500));
        }
        else{
            ESP_LOGD(TAG, "esp_modem_set_mode(ESP_MODEM_MODE_DATA) success");
            break;
        }
    }

    /* Wait for IP address */
    ESP_LOGI(TAG, "Waiting for IP address");

Basically I:

  1. command mode
  2. Sync
  3. Manual hang-up
  4. get valid signal rssi
  5. data mode

This works correctly (there's some issue where the system hangs for 30 seconds but I will worry about that later). However, now when the connection is unstable or bad (I simulated this by disconnecting the antena) several things cna happen:

  1. The modem doesn't connect, this is the log:

I (16:36:38.211) Modem: Modem Init D (16:36:42.014) Modem: Initializing esp_modem for the SIM7070 module... I (16:36:42.016) uart: ESP_INTR_FLAG_IRAM flag not set while CONFIG_UART_ISR_IN_IRAM is enabled, flag updated I (16:36:42.022) uart: queue free spaces: 30 I (16:36:42.530) Modem: Switching to command mode I (16:36:42.531) esp-netif_lwip-ppp: User interrupt I (16:36:42.532) Modem: PPP state changed event 5 I (16:36:42.535) Modem: User interrupted event from netif:0x3fced3e0 I (16:36:42.543) esp_modem_netif: PPP state changed event 5 I (16:37:08.052) Modem: Retry sync 3 times I (16:37:08.055) Modem: Manual hang-up before reconnecting I (16:37:08.065) Modem: Signal quality: rssi=6, ber=99 D (16:37:08.082) Modem: esp_modem_set_mode(ESP_MODEM_MODE_DATA) success I (16:37:08.083) Modem: Waiting for IP address I (16:37:40.670) esp-netif_lwip-ppp: User interrupt I (16:37:40.671) Modem: PPP state changed event 5 I (16:37:40.672) Modem: User interrupted event from netif:0x3fced3e0 I (16:37:40.678) esp_modem_netif: PPP state changed event 5

From there the system WDT timer timeouts after 3 minutes and resets everything

  1. The modem connects and aquieres an IP. But the connection is not good and so the system hangs while trying to do things with the server:

I (09:05:56.287) Modem: Modem Init D (09:06:00.090) Modem: Initializing esp_modem for the SIM7070 module... I (09:06:00.092) uart: ESP_INTR_FLAG_IRAM flag not set while CONFIG_UART_ISR_IN_IRAM is enabled, flag updated I (09:06:00.098) uart: queue free spaces: 30 I (09:06:00.606) Modem: Switching to command mode I (09:06:00.607) esp-netif_lwip-ppp: User interrupt I (09:06:00.608) Modem: PPP state changed event 5 I (09:06:00.611) Modem: User interrupted event from netif:0x3fcecf88 I (09:06:00.619) esp_modem_netif: PPP state changed event 5 I (09:06:26.128) Modem: Retry sync 3 times I (09:06:26.132) Modem: Manual hang-up before reconnecting I (09:06:26.141) Modem: Signal quality: rssi=6, ber=99 D (09:06:26.158) Modem: esp_modem_set_mode(ESP_MODEM_MODE_DATA) success I (09:06:26.158) Modem: Waiting for IP address I (09:06:26.204) esp-netif_lwip-ppp: Connected I (09:06:26.205) esp-netif_lwip-ppp: Name Server1: 80.58.61.250 I (09:06:26.206) esp-netif_lwip-ppp: Name Server2: 80.58.61.254 D (09:06:26.213) Modem: IP event! 6 I (09:06:26.216) Modem: Modem Connect to PPP Server D (09:06:26.221) Modem: ~~~~~~ D (09:06:26.225) Modem: IP : 10.162.127.128 D (09:06:26.230) Modem: Netmask : 255.255.255.255 D (09:06:26.235) Modem: Gateway : 10.64.64.64 D (09:06:26.240) Modem: Name Server1: 80.58.61.250 D (09:06:26.245) Modem: Name Server2: 80.58.61.254 D (09:06:26.250) Modem: ~~~~~~ D (09:06:26.254) Modem: GOT ip event!!! I (09:06:26.260) ServiceTask: Connected to internet!!! D (09:06:26.288) citisend_https_ota_update: Running partition type 0 subtype 0 (offset 0x00190000) I (09:06:29.296) RTC_ML: Notification of a time synchronization event I (09:06:35.870) citisend_https_ota_update: HTTP Status Code OTA: 200 D (09:06:35.871) citisend_https_ota_update: Writing to partition subtype 16 at offset 0x390000 E (09:06:46.639) TRANSPORT_BASE: esp_tls_conn_read error, errno=No more processes W (09:06:46.640) HTTP_CLIENT: esp_transport_read returned:-26880 and errno:11

In both cases the issue seems clear to me: the system doesn't detect something is wrong with the connection. And I think this is what you were trying to explain here:

Also, the PPP protocol might not be a great choice when performing retries and checking the state, so you can also try if the network is available using AT commands (if so, you'd just switch to PPP without issues, if not you'd simply backoff and go to sleep) similarly as it's done in this example:

So do you thing that checking for an IP should give me information on the connection state? Because from case @2 the IP is acquired. Does this mean that I will have to continuously switch between command mode and data mode to get the IP address via AT command?

david-cermak commented 8 months ago

So do you thing that checking for an IP should give me information on the connection state?

Yes, this could be an indication whether we'd be able to connect before we start connecting, so I think it's useful as a quick check.

Because from case @2 the IP is acquired.

of course, this won't help when we're already connected and we lose the IP.

Does this mean that I will have to continuously switch between command mode and data mode

Note, that this library also supports CMUX mode, you can run AT commands while connected.


But in general, I think what could be helpful in general would be some kind of try-connect-backoff mechanism with early feedback. Here the "try-connect" part could be checking the rssi, maybe trying to acquire an IP, checking PPP phase events. The "backoff" part should involve few retries, but going to sleep whenever possible between retries (see this example https://github.com/espressif/esp-protocols/tree/master/components/esp_modem/examples/modem_psm to see how the sleep modes of the modem and the ESP32 could be combined). For checking the PPP phase events, just enable CONFIG_PPP_NOTIFY_PHASE_SUPPORT and ppp_phase_event_enabled, then register for the NETIF_PPP_STATUS events. You should get NETIF_PPP_PHASE_TERMINATE much sooner that the lost-IP event. But it's also possible that the PPP connection could be up and running while the internet connection could be really bad.

mbastida123 commented 8 months ago

But it's also possible that the PPP connection could be up and running while the internet connection could be really bad. That's exactly what happens. The esp_http_client is unable to connect (esp_tls returns error) but from the PPP standpoint everything is alright and I don't see any of the events enabled via CONFIG_PPP_NOTIFY_PHASE_SUPPORT and ppp_phase_event_enabled

That is exactly what happens: The HTTP connections fail but there are no events from PPP even after enabling CONFIG_PPP_NOTIFY_PHASE_SUPPORT and ppp_phase_event_enabled.

I'm considering using CMUX mode so that at least I can check the IP address continuously. However, when I change to that mode the modem doesn't connect:

I (16:15:50.285) Modem: Modem Init D (16:15:54.087) esp-netif_lwip-ppp: esp_netif_new_ppp: PPP connection created: 0x3fcd70fc D (16:15:54.088) esp-netif_lwip-ppp: Phase Dead D (16:15:54.090) Modem: Initializing esp_modem for the SIM7070 module... I (16:15:54.097) uart: ESP_INTR_FLAG_IRAM flag not set while CONFIG_UART_ISR_IN_IRAM is enabled, flag updated I (16:15:54.108) uart: queue free spaces: 30 V (16:15:54.114) command_lib: sync V (16:15:54.116) command_lib: generic_command_common V (16:15:54.121) command_lib: generic_command D (16:15:54.125) command_lib: generic_command command AT

D (16:15:54.497) command_lib: Response: +CFUN: 1

I (16:15:54.630) Modem: Switching to command mode V (16:15:54.631) command_lib: set_cmux V (16:15:54.631) command_lib: generic_command_common V (16:15:54.634) command_lib: generic_command D (16:15:54.638) command_lib: generic_command command AT+CMUX=0

D (16:15:54.649) command_lib: Response: +CPIN: READY AT+CMUX=0 OK

V (16:15:54.781) command_lib: set_echo V (16:15:54.782) command_lib: generic_command_common V (16:15:54.782) command_lib: generic_command D (16:15:54.784) command_lib: generic_command command ATE0

D (16:15:54.795) command_lib: Response: OK

V (16:15:54.796) command_lib: set_pdp_context V (16:15:54.799) command_lib: generic_command_common V (16:15:54.804) command_lib: generic_command D (16:15:54.809) command_lib: generic_command command AT+CGDCONT=1,"IP",""

D (16:15:54.821) command_lib: Response: OK

V (16:15:54.822) command_lib: set_data_mode_alt V (16:15:54.825) command_lib: generic_command D (16:15:54.830) command_lib: generic_command command ATD*99##

D (16:15:54.844) command_lib: Response: CONNECT 150000000

D (16:15:54.845) esp-netif_lwip-ppp: esp_netif_start_ppp: Starting PPP connection: 0x3fcd70fc D (16:15:54.851) esp-netif_lwip-ppp: Phase Start D (16:15:54.855) esp-netif_lwip-ppp: Phase Establish I (16:15:54.855) Modem: PPP state changed event 259 I (16:15:54.865) Modem: Retry sync 3 times D (16:15:54.866) Modem: Unprocessed PPP event: 259 I (16:15:54.876) Modem: PPP state changed event 262 V (16:15:54.880) command_lib: sync V (16:15:54.884) command_lib: generic_command_common D (16:15:54.889) Modem: Unprocessed PPP event: 262 D (16:15:54.891) esp-netif_lwip-ppp: Phase Authenticate V (16:15:54.897) command_lib: generic_command I (16:15:54.900) Modem: PPP state changed event 263 D (16:15:54.904) esp-netif_lwip-ppp: Phase Network D (16:15:54.910) Modem: Unprocessed PPP event: 263 D (16:15:54.917) command_lib: generic_command command AT

I (16:15:54.924) Modem: PPP state changed event 265 D (16:15:54.929) command_lib: Response: OK

D (16:15:54.931) Modem: Unprocessed PPP event: 265 D (16:15:54.936) esp-netif_lwip-ppp: Phase Establish I (16:15:54.940) Modem: Manual hang-up before reconnecting I (16:15:54.947) Modem: PPP state changed event 262 V (16:15:54.955) command_lib: at D (16:15:54.957) Modem: Unprocessed PPP event: 262 V (16:15:54.960) command_lib: generic_get_string } (16:15:54.977) command_lib: Token: {

V (16:15:54.978) command_lib: Token: {OK}

V (16:15:54.979) command_lib: get_signal_quality V (16:15:54.983) command_lib: generic_get_string } (16:15:54.994) command_lib: Token: {

V (16:15:54.995) command_lib: Token: {+CSQ: 99,99}

} (16:15:54.997) command_lib: Token: {

V (16:15:55.001) command_lib: Token: {OK}

I (16:15:55.005) Modem: Signal quality: rssi=99, ber=99 V (16:15:56.011) command_lib: get_signal_quality V (16:15:56.012) command_lib: generic_get_string } (16:15:56.018) command_lib: Token: {

V (16:15:56.019) command_lib: Token: {+CSQ: 99,99}

} (16:15:56.019) command_lib: Token: {

V (16:15:56.023) command_lib: Token: {OK}

I (16:15:56.028) Modem: Signal quality: rssi=99, ber=99 V (16:15:57.033) command_lib: get_signal_quality V (16:15:57.034) command_lib: generic_get_string } (16:15:57.040) command_lib: Token: {

V (16:15:57.041) command_lib: Token: {+CSQ: 99,99}

} (16:15:57.042) command_lib: Token: {

V (16:15:57.045) command_lib: Token: {OK}

I (16:15:57.050) Modem: Signal quality: rssi=99, ber=99 V (16:15:58.055) command_lib: get_signal_quality V (16:15:58.056) command_lib: generic_get_string } (16:15:58.062) command_lib: Token: {

V (16:15:58.063) command_lib: Token: {+CSQ: 99,99}

} (16:15:58.063) command_lib: Token: {

V (16:15:58.067) command_lib: Token: {OK}

I (16:15:58.072) Modem: Signal quality: rssi=99, ber=99 V (16:15:59.077) command_lib: get_signal_quality V (16:15:59.078) command_lib: generic_get_string } (16:15:59.084) command_lib: Token: {

V (16:15:59.085) command_lib: Token: {+CSQ: 20,99}

} (16:15:59.086) command_lib: Token: {

V (16:15:59.089) command_lib: Token: {OK}

I (16:15:59.094) Modem: Signal quality: rssi=20, ber=99 I (16:16:00.188) APP_Identification_HL: APP ID timeout 2 I (16:16:00.188) APP_Identification_HL: timeout_app_id: saveCurrentIdentification D (16:16:00.945) esp-netif_lwip-ppp: Phase Disconnect D (16:16:00.946) esp-netif_lwip-ppp: Phase Dead I (16:16:00.947) esp-netif_lwip-ppp: Connection lost I (16:16:01.002) Modem: PPP state changed event 268 D (16:16:01.003) Modem: Unprocessed PPP event: 268 I (16:16:01.004) Modem: PPP state changed event 256 W (16:16:01.008) Modem: PPP connection dead D (16:16:01.012) Modem: Retrying internet connection V (16:16:01.018) command_lib: sync D (16:16:01.020) Modem: IP event! 7 V (16:16:01.021) command_lib: generic_command_common W (16:16:01.024) Modem: Modem Disconnect from PPP Server V (16:16:01.030) command_lib: generic_command D (16:16:01.036) Modem: Retrying internet connection D (16:16:01.040) command_lib: generic_command command AT

V (16:16:01.051) command_lib: sync D (16:16:01.054) command_lib: Response: OK

I (16:16:01.059) Modem: Manual hang-up before reconnecting V (16:16:01.065) command_lib: generic_command_common V (16:16:01.070) command_lib: at V (16:16:01.074) command_lib: generic_get_string V (16:16:01.078) command_lib: generic_command } (16:16:01.083) command_lib: Token: {

V (16:16:01.087) command_lib: Token: {OK}

D (16:16:01.092) command_lib: generic_command command AT

V (16:16:01.097) command_lib: get_signal_quality D (16:16:01.101) command_lib: Response: OK

I (16:16:01.107) Modem: Manual hang-up before reconnecting V (16:16:01.113) command_lib: at V (16:16:01.116) command_lib: generic_get_string V (16:16:01.122) command_lib: generic_get_string } (16:16:01.125) command_lib: Token: {

V (16:16:01.129) command_lib: Token: {OK}

V (16:16:01.134) command_lib: get_signal_quality } (16:16:01.140) command_lib: Token: {

V (16:16:01.142) command_lib: Token: {+CSQ: 20,99}

} (16:16:01.147) command_lib: Token: {

V (16:16:01.151) command_lib: Token: {OK}

V (16:16:01.156) command_lib: generic_get_string I (16:16:01.161) Modem: Signal quality: rssi=20, ber=99 } (16:16:01.167) command_lib: Token: {

V (16:16:01.170) command_lib: Token: {+CSQ: 20,99}

} (16:16:01.175) command_lib: Token: {

V (16:16:01.180) command_lib: Token: {OK}

I (16:16:01.186) Modem: Signal quality: rssi=20, ber=99 E (49323) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (49323) task_wdt: - main (CPU 0) E (49323) task_wdt: Tasks currently running: E (49323) task_wdt: CPU 0: sys_evt E (49323) task_wdt: CPU 1: IDLE E (49323) task_wdt: Aborting.

The modem aparently enters CMUX mode successfully (ignore the mentions to MODE_DATA on the log, I just haven't changed that) but then I can see from the PPP events that the connection fails. On that fail the system retries the connection process (done by me) but at this point the watchdog ends up being triggered. Even if I don't retry the connection on that event the WDT ends up being triggered anyway.

Taking a look at the example it doesn't seem like I'm doing anything differently.

On some cases the WDT triggers inmediatelly on the first connect try:

I (09:01:29.989) Modem: Modem Init D (09:01:33.792) esp-netif_lwip-ppp: esp_netif_new_ppp: PPP connection created: 0x3fcd7044 D (09:01:33.793) esp-netif_lwip-ppp: Phase Dead D (09:01:33.794) Modem: Initializing esp_modem for the SIM7070 module... I (09:01:33.801) uart: ESP_INTR_FLAG_IRAM flag not set while CONFIG_UART_ISR_IN_IRAM is enabled, flag updated I (09:01:33.812) uart: queue free spaces: 30 V (09:01:33.819) command_lib: sync V (09:01:33.820) command_lib: generic_command_common V (09:01:33.825) command_lib: generic_command D (09:01:33.829) command_lib: generic_command command AT

I (09:01:34.334) Modem: Switching to command mode V (09:01:34.335) command_lib: set_cmux V (09:01:34.336) command_lib: generic_command_common V (09:01:34.338) command_lib: generic_command D (09:01:34.343) command_lib: generic_command command AT+CMUX=0

I (09:01:35.968) Modem: Retry sync 3 times V (09:01:35.969) command_lib: sync V (09:01:35.970) command_lib: generic_command_common V (09:01:35.971) command_lib: generic_command D (09:01:35.976) command_lib: generic_command command AT

D (09:01:35.983) command_lib: Response: AT OK

I (09:01:35.986) Modem: Manual hang-up before reconnecting V (09:01:35.993) command_lib: at V (09:01:35.996) command_lib: generic_get_string V (09:01:36.003) command_lib: Token: {ATH}

V (09:01:36.005) command_lib: Token: {OK}

V (09:01:36.009) command_lib: get_signal_quality V (09:01:36.014) command_lib: generic_get_string V (09:01:36.022) command_lib: Token: {AT+CSQ}

V (09:01:36.023) command_lib: Token: {+CSQ: 99,99}

} (09:01:36.028) command_lib: Token: {

V (09:01:36.032) command_lib: Token: {OK}

I (09:01:36.037) Modem: Signal quality: rssi=99, ber=99 V (09:01:37.042) command_lib: get_signal_quality V (09:01:37.043) command_lib: generic_get_string V (09:01:37.048) command_lib: Token: {AT+CSQ}

V (09:01:37.048) command_lib: Token: {+CSQ: 99,99}

} (09:01:37.051) command_lib: Token: {

V (09:01:37.055) command_lib: Token: {OK}

I (09:01:37.059) Modem: Signal quality: rssi=99, ber=99 V (09:01:38.064) command_lib: get_signal_quality V (09:01:38.065) command_lib: generic_get_string V (09:01:38.070) command_lib: Token: {AT+CSQ}

V (09:01:38.070) command_lib: Token: {+CSQ: 21,99}

} (09:01:38.073) command_lib: Token: {

V (09:01:38.077) command_lib: Token: {OK}

I (09:01:38.081) Modem: Signal quality: rssi=21, ber=99 E (31631) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (31631) task_wdt: - main (CPU 0) E (31631) task_wdt: Tasks currently running: E (31631) task_wdt: CPU 0: sys_evt E (31631) task_wdt: CPU 1: IDLE E (31631) task_wdt: Aborting.

david-cermak commented 7 months ago

That is exactly what happens: The HTTP connections fail but there are no events from PPP even after enabling CONFIG_PPP_NOTIFY_PHASE_SUPPORT and ppp_phase_event_enabled.

Yes, this possibly means the connection got unstable somewhere between the device and the cellular network, but the PPP connection and lower layers are stable. There's not much that could be done on the modem layer, the only suggestion would be to periodically check connection on the application layers, like pinging some addresses, using TCP keepalive, MQTT pings, etc.

I'm considering using CMUX mode so that at least I can check the IP address continuously.

I think that's probably not needed after all (in your application), as checking the IP address makes only sense before you start connecting.

but at this point the watchdog ends up being triggered.

The log says that the main didn't reset the task WDT, adding a TaskDelay(~20ms) before checking the signal should help

Taking a look at the example it doesn't seem like I'm doing anything differently.

A quick and easy check is to run the (unchanged) example with your device. Chances are that there might still be some issues with CMUX mode with your device SIM7080G (never tested CMUX mode with this device, but already got one and going to check it soon)

theluc6234 commented 7 months ago

I have seen the issue device lost IP after running for a day, after research I think this is due to DHCP Lease Renewal: If your IoT device is obtaining its IP address dynamically using DHCP (Dynamic Host Configuration Protocol), the lease duration provided by the ISP might be expiring after a day. This would require the device to renew its IP address periodically. Ensure that your device is configured to renew its DHCP lease appropriately. I am looking for a way to retrieve the IP address instead of hard reset device.

mbastida123 commented 7 months ago

I have seen the issue device lost IP after running for a day, after research I think this is due to DHCP Lease Renewal: If your IoT device is obtaining its IP address dynamically using DHCP (Dynamic Host Configuration Protocol), the lease duration provided by the ISP might be expiring after a day. This would require the device to renew its IP address periodically. Ensure that your device is configured to renew its DHCP lease appropriately. I am looking for a way to retrieve the IP address instead of hard reset device.

I don't think this is a issue for us. Due to power usage concerns we only connect to the internet once or twice a day and this connection (if there isn't an OTA update pending) lasts fors 30-40 seconds.