espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
198 stars 135 forks source link

Add the ability to resume CMUX (or other modem modes) when waking up from deep sleep (IDFGH-14047) #692

Open jlittlewood opened 3 weeks ago

jlittlewood commented 3 weeks ago

Is your feature request related to a problem?

I am using the esp_modem library and my device needs to support deep sleep. I need access to commands and data through UART, so I need to use CMUX mode. I am not aware of any way to set the DCE to CMUX mode on wakeup, when the modem is already in CMUX mode. When the DCE is initialized after wakeup it is in an undefined mode, which allows switching to any mode, but with the modem already set up and communicating in CMUX mode, simply running the set_mode to CMUX will not work, since it is dependent on various responses from the modem which may be communicating on different channels due to the CMUX setting being active on the modem.

Describe the solution you'd like.

Ideally, there would be a way to resume a mode upon initialization. The resume will assume the modem is already in the perspective mode and set up the DCE, netif, and UART terminals based on the mode that is being resumed.

Describe alternatives you've considered.

I have attempted to make my own CMUX resume function in the DCE that would set up the primary and secondary terminals and start the netif, but would not fail based on responses from the modem. This method has not been successful. I have not been able to get data to work, and commands usually don't work either.

Another option is to reset the modem each time, but this is undesirable due to the longer time to boot up the modem and connect. I need the modem to be ready as fast as possible from deep sleep wakeup.

Similarly, I have tried changing the modes prior to entering sleep. This has been problematic because I need the modem to be in PPP mode (or CMUX) while the MCU is asleep, so I can get the GPIO wake-ups for incoming data. This method also slows down the sleep process.

Additional context.

I am using an ESP32-S3 and the esp_modem library for UART modem connectivity. The CMUX works fine after the modem is rebooted. The main issue is how to get the DCE mode to match the modem mode that was set prior to the deep sleep.

david-cermak commented 3 weeks ago

Have you tried the new mode-detection feature? -- released yesterday as v1.2.0

Add support for guessing mode (52598e5f)

The idea is that you issue dce->set_mode(AUTODETECT) after wakeup and the modem lib sends a few probes to match the actual mode. In case of CMUX, it needs to go over (all) virtual terminals, so is uses manual CMUX modes.

jlittlewood commented 3 weeks ago

That release does seem to have a lot of features I have been needing. Unfortunately it does not seem to be working for me.

In my code, on startup the modem boots up and I set the mode to CMUX mode. This works great and I am able to get data and commands working without any problem. Then when the MCU enters deep sleep the netif and DCE are deinited. When the MCU wakes from deep sleep I set the mode to AUTODETECT, but I get the response that it failed. No log lines about the primary or secondary terminal are printed. I added a print in the set_unsafe to see where it was failing. It is getting inside this if statement and returning false (inside the DCE_Mode::set_unsafe function in esp_modem_dce.cpp):

case modem_mode::AUTODETECT: {
        auto guessed = guess_unsafe(dte, true);
        if (guessed == modem_mode::UNDEF) { // AUTODETECT failing because of this response
            printf("Guessed mode is undefined\n"); // Added print to see where it was failing
            return false;
        }

Also even if the AUTODETECT did work, it looks like the only CMUX resumes are the manual CMUX modes. I dont see a possible transition from manual CMUX to the regular CMUX. Is that something that is possible? I still need to maintain support for commands and data, without regularly swapping between the two modes.

Here are the results from trying to resume using the different resume modes that are provided:

I also tried resuming modes, and being in CMUX manual prior to deep sleep, and it still gave me the same results as above, with no modem commands or data working.

Is there something else I am missing or anything special needed prior to entering deep sleep or after wakeup?

I am using the C APIs. A brief overview of my modem setup and teardown is shown below:

C API function to allow resume modes from C

extern "C" esp_err_t esp_modem_resume_mode(esp_modem_dce_t *dce_wrap, esp_modem_dce_resume_mode_t mode)
{
    if (dce_wrap == nullptr || dce_wrap->dce == nullptr) {
        return ESP_ERR_INVALID_ARG;
    }
    switch (mode) {
    case ESP_MODEM_RESUME_DATA_MODE:
        return dce_wrap->dce->set_mode(modem_mode::RESUME_DATA_MODE) ? ESP_OK : ESP_FAIL;
    case ESP_MODEM_RESUME_COMMAND_MODE:
        return dce_wrap->dce->set_mode(modem_mode::RESUME_COMMAND_MODE) ? ESP_OK : ESP_FAIL;
    case ESP_MODEM_RESUME_CMUX_MANUAL_MODE:
        return dce_wrap->dce->set_mode(modem_mode::RESUME_CMUX_MANUAL_MODE) ? ESP_OK : ESP_FAIL;
    case ESP_MODEM_RESUME_CMUX_MANUAL_DATA:
        return dce_wrap->dce->set_mode(modem_mode::RESUME_CMUX_MANUAL_DATA) ? ESP_OK : ESP_FAIL;
    case ESP_MODEM_AUTODETECT:
        return dce_wrap->dce->set_mode(modem_mode::AUTODETECT) ? ESP_OK : ESP_FAIL;
    return ESP_ERR_NOT_SUPPORTED;
    }   
}

Inside wakeup/startup

esp_netif_config_t netif_ppp_config = ESP_NETIF_DEFAULT_PPP();
esp_netif_t * net_if = esp_netif_new(&netif_ppp_config);
esp_modem_dce_config_t dce_config = ESP_MODEM_DCE_DEFAULT_CONFIG(apn);
esp_modem_dte_config_t dte_config = ESP_MODEM_DTE_DEFAULT_CONFIG();
esp_modem_dce_t* dce = esp_modem_new_dev(ESP_MODEM_DCE_CUSTOM, &dte_config, &dce_config, net_if);

esp_err_t ret;
if (deep_sleep_wakeup) { 
    // Wake from deep sleep, modem is not rebooted and should resume previous mode
    ret = esp_modem_resume_mode(dce, ESP_MODEM_AUTODETECT); 
    // I have tried all of the available resume modes, but none make the commands and data work again, 
    // despite 3 of the 5 modes returning ESP_OK (see list above for results for each mode).
} else { 
    // Regular startup. Modem was rebooted
   ret = esp_modem_set_mode(dce,ESP_MODEM_MODE_CMUX);
}
printf("Modem mode setup: %s\n",esp_err_to_name(ret));

Inside prepare for deep sleep

esp_modem_destroy(dce);
esp_netif_destroy(net_if);

As a note, I am using the Qualcomm QCX216 modem.

david-cermak commented 2 weeks ago

I haven't tested it with many devices, but I'm surprised it won't work even with basic command and data modes?

Could you please share more logs after applying these changes:

--- a/components/esp_modem/src/esp_modem_uart.cpp
+++ b/components/esp_modem/src/esp_modem_uart.cpp
@@ -175,7 +175,9 @@ int UartTerminal::read(uint8_t *data, size_t len)
     uart_get_buffered_data_len(uart.port, &length);
     length = std::min(len, length);
     if (length > 0) {
-        return uart_read_bytes(uart.port, data, length, portMAX_DELAY);
+        int read_len = uart_read_bytes(uart.port, data, length, portMAX_DELAY);
+        ESP_LOG_BUFFER_HEXDUMP("read", data, read_len, ESP_LOG_INFO);
+        return read_len;
     }
     return 0;
@@ -182,6 +182,7 @@ int UartTerminal::read(uint8_t *data, size_t len)

 int UartTerminal::write(uint8_t *data, size_t len)
 {
+    ESP_LOG_BUFFER_HEXDUMP("write", data, len, ESP_LOG_INFO);
     return uart_write_bytes_compat(uart.port, data, len);
 }

Let's start with these simple scenarios (no CMUX for now):

1) Resume command mode

2) Resume data mode

And if you can share the logs from these two procedures above, it would be really helpful!

jlittlewood commented 2 weeks ago

Thank you so much for your help with this issue, it has been blocking me for quite some time now.

I performed the tests you requested. In the resume CMD mode, there was no issues and I didnt expect there to one any problems with that mode. When testing the resume PPP mode, it failed to resume, but I noticed that the sync that was called on startup was successful. So I repeated the test and tried entering PPP (not the resume data, just the regular data) and that was successful. I have provided logs for those 3 tests.

For the sake of efficiency, I also provided the logs for the same test using the CMUX and AUTO/CMUX resume modes. This is not working and is the use case I need.

In my project we need to be able to use the PPP data and send commands to the modem. Switching out of CMUX mode before sleeping is not preferred since the modem is reset when exiting CMUX, and I would need to wait for startup and connect to PPP. This causes a 10-30 second delay in the deep sleep setup, which will not work.

The other option would be to restart the modem every time the MCU wakes from deep sleep, but that also causes the delay in the startup process and I need the device to be able to make a quick connection.

I have tried the USB DCE to use the 2 interfaces for data and commands, which works, but this is not an option due to the higher power usage of USB compared to UART.

From an analysis of the modem log, it appears that in the case of the resume CMUX/AUTO/PPP modes, the modem did not receive any data from the MCU after wakeup from deep sleep. The init and startup process is not different based on the modes, as I am manually entering and resuming modes for these tests.

test-Resume CMUX mode.txt test-Resume data mode.txt test-Normal data mode.txt test-Resume command mode.txt

david-cermak commented 2 weeks ago

So you're also setting your device to a low power mode, correct? And is seems like it wakes up in the default command mode (your modem_sync command succeeds) -- then this command

dce->set_mode(ESP_MODEM_AUTODETECT);

should succeed and resume the command mode (i.e. do nothing).

jlittlewood commented 1 week ago

I am not intentionally setting the modem to low power mode or changing anything on the modem. I am sending no commands to the modem as part of the enter sleep. I am not sure if the modem is timing out or reverting to a different state when the MCU is asleep/not responding. I am working with Qualcomm to better understand that, but from the logs I have reviewed and my understanding, that is not happening in the CMUX case even though something like that does appear to be happening in the PPP mode, but perhaps not when CMUX is enabled.

I did the repeat the test by setting data mode to enter PPP sleep the MCU wakeup the MCU resume mode AUTO detect

This test was successfully and resumed to CMD mode, as we would expect since in earlier test it entered CMD mode on wakeup. So in the case of PPP and CMD, the resume options are working.

The real issue I am facing is with the CMUX. CMUX resume is not working and neither is AUTO, when the mode was in CMUX prior to sleeping.

The closest method I have to a successful case with CMUX is as follow:

In this test I did not use any of the resume functionality, just the standard set modes. As part of startup I enter CMUX manual then transition to CMUX manual data. This is the same behavior whether the device is waking up from deep sleep or initial startup. Then to prepare for deep sleep the mode is set to CMUX manual command then to CMUX exit. If the CMUX is properly exited prior to sleep, I am able to use the modem and enter CMUX manual again after wakeup.

The reason I need to go to CMUX manual command prior to the CMUX exit is due to the fact that the modem reboots if the CMUX is exited from an active CMUX data state, but not from a CMUX command state. This speeds up the prepare for sleep process.

There does seem to be an issue when transitioning from CMUX data to CMUX command. It sends the +++ and immediately gets the NO CARRIER response, but instead of returning success, it tries a few more times, then reports that the command failed. It does successfully transition out of data mode, but it slows it down since it keeps retying, which will fail because the data is already suspended. Please see the attached log that shows this issue. test-cmux_manual_data_to_cmux_manaual_cmd_fails.txt

The successful CMUX test case mentioned above could be a viable solution if we can figure out why the +++ thinks it failed, when it didn't, and if I can work with Qualcomm to ensure we can get a GPIO wakeup while the data mode is suspended.

david-cermak commented 1 week ago

@jlittlewood Could you please test the CMUX manual data -> cmd transition with this update?

--- a/components/esp_modem/src/esp_modem_dce.cpp
+++ b/components/esp_modem/src/esp_modem_dce.cpp
@@ -18,7 +18,6 @@ namespace transitions {

 static bool exit_data(DTE &dte, ModuleIf &device, Netif &netif)
 {
-    netif.stop();
     auto signal = std::make_shared<SignalGroup>();
     std::weak_ptr<SignalGroup> weak_signal = signal;
     dte.set_read_cb([&netif, weak_signal](uint8_t *data, size_t len) -> bool {
@@ -44,6 +43,7 @@ static bool exit_data(DTE &dte, ModuleIf &device, Netif &netif)
         }
         return false;
     });
+    netif.stop();
     netif.wait_until_ppp_exits();
     if (!signal->wait(1, 2000)) {

(moving the netif.stop() after that lambda)

jlittlewood commented 1 week ago

@david-cermak I applied the suggested changes, but it did not make a difference. I looked closer in the logs from today and the logs I provided on my previous post and noticed that the NO CARRIER is not in the command_lib response.

V (101571) command_lib: set_command_mode
D (101571) command_lib: generic_command command +++

I (101572) write: 0x3fce06d4   f9 09 ef 07                                       |....|
I (101584) write: 0x3fce081c   2b 2b 2b                                          |+++|
I (101595) write: 0x3fce06d8   35 f9                                             |5.|
I (104531) read: 0x3fcd0370   f9 07 ef 1d 0d 0a 4e 4f  20 43 41 52 52 49 45 52  |......NO CARRIER| <- NO CARRIER response is recieved, but 
I (104532) read: 0x3fcd0380   0d 0a 22 f9                                       |..".|
D (104544) CMUX: Payload frame: dlci:01 type:ef payload:14 available:16
I (104555) read: 0x3fcd0370   f9 07 ef 1d 0d 0a 2b 53  52 56 5f 49 4e 44 3a 31  |......+SRV_IND:1|
I (104557) read: 0x3fcd0380   0d 0a 22 f9 f9 07 ef 1d  0d 0a 2b 53 52 56 5f 49  |..".......+SRV_I|
I (104568) read: 0x3fcd0390   4e 44 3a 31 0d 0a 22 f9                           |ND:1..".|
D (104580) CMUX: Payload frame: dlci:01 type:ef payload:14 available:36
D (104581) CMUX: Payload frame: dlci:01 type:ef payload:14 available:16
V (107596) command_lib: sync
V (107596) command_lib: generic_command_common
V (107597) command_lib: generic_command
D (107597) command_lib: generic_command command AT

I (107608) write: 0x3fce0634   f9 09 ef 07                                       |....|
I (107620) write: 0x3fce086c   41 54 0d                                          |AT.|
I (107621) write: 0x3fce0638   35 f9                                             |5.|
I (107636) read: 0x3fcd0370   f9 0b ef 0b 2b 2b 2b 41  54 5d f9                 |....+++AT].|
D (107637) CMUX: Payload frame: dlci:02 type:ef payload:5 available:7
I (107649) read: 0x3fcd0370   f9 0b ef 13 0d 0a 45 52  52 4f 52 0d 0a 4f f9     |......ERROR..O.|
D (107660) CMUX: Payload frame: dlci:02 type:ef payload:9 available:11
D (107662) command_lib: Response: +++AT
ERROR

Here it is obvious that the modem sent the NO CARRIER response, but I believe it was sent on the data terminal, rather than the command terminal, so the command_lib didn't see that text. This seems to be an issue when CMUX is used. Then since the data is already suspended, sending +++ again causes the error, which is communicated on the same terminal as the commands, so the command_lib correctly perceives it as a failure, but missed the previous success.

david-cermak commented 1 week ago

Yes, you're right, the answer is coming from the other terminal. I'm sorry, haven't noticed this from your previous post:

Test:
    CMUX Manual exit before sleep
        set device to CMUX manual
        set device to CMUX manual data
        set device to CMUX manual command

Could you please try to swap the terminal before going to CMUX-manual-data ? and log the transition from manual-data to manual-command? -- and the whole transition, the last post contained only the retries, i.e. after this wait for closing PPPD timeouted with 2 seconds?

https://github.com/espressif/esp-protocols/blob/849fe7b6cb20ca6312a28e1c5ce4ed8bf6ae5450/components/esp_modem/src/esp_modem_dce.cpp#L48

so the procedure would be (saw that you're using modem console): 1) set_mode CMUX1 <- enter manual cmux 2) set_mode CMUX3 <- swap terminals 3) set_mode CMUX4 <- manual data 4) start logging 5) set_mode CMUX5 <- manual command 6) stop logging

PS: I'm sorry, the manual cmux modes are not documented well; switching the terminals is necessary since we switch the current terminal to data and need to reserve the former one to commands (this actually mimics the standard CMUX transition).

jlittlewood commented 1 week ago

@david-cermak I ran the prosed test and captured the full logs. Unfortunately this test gives the same results. Since the terminal is swapped prior to entering data, the data terminal still gets the response from the +++ so the exit data is still perceived as failed. Lets refer to this as Test 1 test-swap terminal prior to enter data.txt

I ran a similar test but swapped the terminals just prior to exiting data, and that was successful. This makes sense because the NO CARRIER response is received on what is now the command terminal. I will refer to this as Test 2 test-swap terminal prior to exit data.txt

From the 2 tests mentioned above, neither one really fits my requirement well. According to Qualcomm the modem must be in "online command mode" with the data suspended by +++. In Test 1, the device does properly exit data mode and suspend the data with +++ and the CMUX is properly exited without rebooting the modem. In Test 2, the modem successfully transitions states and does not require any retries, but the modem is rebooted to exit CMUX. I believe this reboot may interfere with the suspended data state that Qualcomm requires. So Test 1 is slowed down by the retries and delays, and Test 2 causes a modem reboot.

Is there a way to prevent the modem from reseting when exiting the CMUX?

The requirement I have for my device is that I must be able to boot up and enter CMUX (manual or auto, requires data and commands), then when ready the MCU can deep sleep (with the modem in online command mode), then on wakeup from deep sleep be able to as quickly as possible connect with CMUX (manual or auto, for data and commands) and repeat the sleep wakeup cycle as needed.

The modem resume modes seemed like the solution to meet my requirements, but it does not seem to be working as I would like. Using the CMUX manual modes we seem to be able to get closer to the behavior needed to meet my requirements, but it is not as fast as I would like, causing multiple mode transitions and retries/reboots. I am still open to either solution, or a different approach, if I can optimize the time for init, prepare for sleep, and wake from sleep while ensuring that the modem can be in online command mode when the MCU enters deep sleep.

david-cermak commented 1 week ago

Thanks for testing and sharing the logs!

About the Test1: Was the netif.stop() moved down, below the lambda? (as suggested above) I still don't understand why the lambda wasn't called, seems like the PPP session got closed gracefully both by the netif and the device, but I'd expect these debug prints in your logs:

https://github.com/espressif/esp-protocols/blob/849fe7b6cb20ca6312a28e1c5ce4ed8bf6ae5450/components/esp_modem/src/esp_modem_dce.cpp#L28

This is something we will fix, or find a workaround. And I think this transition (cmux manual data -> command) should go even faster, as it's also blocked on the signal timeouts in your case. The theoretical fix should be something like this:

    dte.set_read_cb([&netif, weak_signal](uint8_t *data, size_t len) -> bool {
        // lambda
    });
    netif.stop();
    netif.wait_until_ppp_exits();
    dte.write((uint8_t*)"+++", 3);  // post `+++` to the data terminal, while the recv lambda active
    if (!signal->wait(1, 2000)) {
        // ...

Also, need to fix the retry sequence, sending a newline before the AT (as the sync command is interpreted as +++AT\r\n Maybe I'll try to get hold of the QCX216 device and check that.

About the Test2: I think this is quite device specific and I haven't really found a way to customize this CMUX exit transition. But even though some devices might just restart.

jlittlewood commented 1 week ago

@david-cermak apologies, I did not turn on logging for those function inside the lambda in my previous test. The netif.stop() was moved to after in that test, and in the test I did today, but the write +++ was not there. I added the write +++ in todays test. I repeated the test with the login enabled. Please see the attached log. Test 1 retry.txt

Also for your reference here is the modified function, with some additional prints.

static bool exit_data(DTE &dte, ModuleIf &device, Netif &netif)
{
    auto signal = std::make_shared<SignalGroup>();
    std::weak_ptr<SignalGroup> weak_signal = signal;
    dte.set_read_cb([&netif, weak_signal](uint8_t *data, size_t len) -> bool {
        // post the transitioning data to the network layers if it contains PPP SOF marker
        if (memchr(data, 0x7E, len))
        {
            ESP_LOG_BUFFER_HEXDUMP("esp-modem: debug_data (PPP)", data, len, ESP_LOG_DEBUG);
            netif.receive(data, len);
        }
        // treat the transitioning data as a textual message if it contains a newline char
        if (memchr(data, '\n', len))
        {
            ESP_LOG_BUFFER_HEXDUMP("esp-modem: debug_data (CMD)", data, len, ESP_LOG_DEBUG);
            const auto pass = std::list<std::string_view>({"NO CARRIER", "DISCONNECTED"});
            std::string_view response((char *) data, len);
            for (auto &it : pass)
                if (response.find(it) != std::string::npos) {
                    if (auto signal = weak_signal.lock()) {
                        signal->set(1);
                    }
                    return true;
                }
        }
        return false;
    });
    printf("netif stop\n");
    netif.stop();
    printf("netif wait for exit\n");
    netif.wait_until_ppp_exits();
    printf("Sending +++\n");
    dte.write((uint8_t*)"+++", 3);  // post `+++` to the data terminal, while the recv lambda active
    printf("signal check\n");
    if (!signal->wait(1, 2000)) {
        dte.set_read_cb(nullptr);
        printf("set cmd mode\n");
        if (!device.set_mode(modem_mode::COMMAND_MODE)) {
            printf("Returning here: 1\n");
            return false;
        }
    }
    dte.set_read_cb(nullptr);
    if (!dte.set_mode(modem_mode::COMMAND_MODE)) {
        printf("Returning here: 2\n");
        return false;
    }
    printf("Returning here: 3\n");
    return true;
}
david-cermak commented 1 week ago

Oh, I see, thanks again for the detailed logging! Now we only need to accept the "OK" reply in the lambda.

            ESP_LOG_BUFFER_HEXDUMP("esp-modem: debug_data (CMD)", data, len, ESP_LOG_DEBUG);
-            const auto pass = std::list<std::string_view>({"NO CARRIER", "DISCONNECTED"});
+            const auto pass = std::list<std::string_view>({"NO CARRIER", "DISCONNECTED", "OK"});
            std::string_view response((char *) data, len);

That should fix the problem and complete the transition before going to the retry loop.

Next step is to fix the retry loop -- need to send a delimiter before the sync(), a quick workaround is to just send another sync:

    bool set_mode(modem_mode mode) override
    {
        ...
        } else if (mode == modem_mode::COMMAND_MODE) {
            Task::Delay(1000); // Mandatory 1s pause before
            int retry = 0;
            while (retry++ < 3) {
                if (set_command_mode() == command_result::OK) {
                    return true;
                }
                Task::Delay(1000); // Mandatory 1s pause after escape
                if (sync() == command_result::OK) {
                    return true;
                }
+                Task::Delay(100);
+                if (sync() == command_result::OK) {
+                    return true;
+                }
                Task::Delay(1000); // Mandatory 1s pause before escape
            }
            return false;
            ...
    }

(and the last thing is to cleanup the changes and re-test with my devices)

jlittlewood commented 6 days ago

@david-cermak that worked!

With those changes it does not get stuck retrying anymore. It also seems that the CMUX swap terminals is not needed. I had the same results with or without the CMUX swap. I was also pleased to see that the CMUX exit did not cause a modem reboot.

I believe the proposed method of CMUX1 -> CMUX4 -> CMUX5 -> CMUX2 (CMUX manual -> CMUX data -> CMUX command -> CMUX exit) will meet my requirements. I also confirmed that I can deep sleep and repeat this process successfully.

I understand the challenges of supporting multiple modems and the differences for each one. I really appreciate your quick resolution on this issue and the debugging assistance you provided!

Here are my logs for your reference: modem_retest_pass.txt modem_retest_pass_no_swap.txt

david-cermak commented 1 day ago

Thanks for your patience with testing and sharing the results! and thanks for the good news that it finally works.

I've pushed the changes to this PR: https://github.com/espressif/esp-protocols/pull/702 but also tested with some other devices and had to make it (partly) configurable (since the "+++" sequence causes strange issues with some simcom devices during PPP session close)