espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
166 stars 115 forks source link

Response Issue in CMUX Mode (IDFGH-12431) #536

Closed brhmrsln closed 2 months ago

brhmrsln commented 3 months ago

Answers checklist.

General issue report

Environment:

Problem Description: We are utilizing the Quectel MC60 module in conjunction with the ESP32-S3, employing the ESP-MODEM component for communication. Our application involves connecting to the internet via the MC60's all-in-one solution and periodically requesting NMEA sentences.

Issue: In CMUX mode, after successfully establishing an internet connection and receiving an IP address, we attempt to fetch GLL NMEA sentences using the command "AT+QGNSSRD" periodically. The MC60 appears to respond correctly with the requested sentence followed by an "OK" acknowledgment. However, In the initial response, we receive only "nmeasentence" without the preceding "OK". Subsequent requests then yield an "OK" response that seems to belong to the previous request, leading to a cycle where each NMEA sentence request is followed by an "OK" from the prior request, effectively causing empty responses and misaligned confirmations.

Observations: The entire response (including "OK") is present in the UART buffer, suggesting that the issue might be with how the response is processed or retrieved. We employ esp_modem::dce_commands::generic_get_string to capture the response. The issue consistently occurs when the ESP system clock is set to either 240 MHz or 160 MHz. Interestingly, adding a delay at the beginning of the bool DTE::command_cb::process_line function mitigates the issue. At an 80 MHz ESP system clock frequency, the problem does not occur, and the system behaves as expected, receiving both the NMEA sentence and the "OK" acknowledgment without issues. Upon inspecting the communication with a logic analyzer on the MC60 side, it is observed that the module transmits the complete response (sentence + "OK") as expected, indicating that the UART transmission from the MC60 is functioning correctly.

Request: Could you provide insight or a solution to this problem? It appears that system clock frequencies above 80 MHz introduce timing issues with UART communication or processing within the ESP-MODEM component, specifically in the context of receiving complete responses from the MC60 module. The observation with a logic analyzer suggests that the issue lies not in the MC60's transmission but possibly in the reception or processing of the data by the ESP32-S3 or the ESP-MODEM component.

Thank you for your assistance.

david-cermak commented 3 months ago

we receive only "nmeasentence" without the preceding "OK".

just for me to understand, you're expecting a response pre-pended by "OK" and trying to parse it using dce_commands::generic_get_string()? This command looks for the OK as a delimiter at the end of the message, for the first occurrence, so it probably just returns the buffer as is and stops looking for any other OK...

Maybe an alternative approach could be to use dce_commands::at_raw() which checks for an arbitrary pattern and returns the entire buffer (doesn't strip the OK)

brhmrsln commented 3 months ago

Hi David Thanks for the quick response. I am expecting the ‘OK’ at the end of the response. And when I check the logic analyzer it is coming from MC60 as expected. (Sentence+OK). And I also log the uart buffer and the answer is as we expected. (Sentence + OK). But when I check the data inside the dce_commands::generic_get_string()(I log the data) I can only see the sentence not the OK at the end.(it fills the parameter and returns timeout, because the OK is not there)When I reguest second nmea sentence, now the ‘previous OK + Sentence + OK ‘ is found and the response is empty.(As you say when it finds the OK, it returns). Same setup, somehow when we put delay at the beggining of the dte::command_cb::process_line it works as expected. And also if i configure the esp system clock frequency to 80Mhz, it also works. I tried 240 Mhz and 160Mhz and in both configuration I get the issue. I get the issue in CMUX mode, in command mode it works. I will try the dce_commands::at_raw(). Thank you, Best regards.

brhmrsln commented 3 months ago

I try the dce_commands::at_raw(). Issue remains. Here are some outputs:

*80Mhz working log: 80Mhz_working_example

brhmrsln commented 3 months ago

And also 1 sec delay at the beggining of the DTE::command_cb::process_line, working log: 1secDelay

david-cermak commented 3 months ago

Thanks for sharing the log. Is it possible that some command before that failed or timeouted? (and it passed while running on 80MHz?)

I think we can workaround it simply by flushing the Rx buffer before each command. (perhaps this would sound like a solution, because we do not need the data from previous commands, but I'd still like to understand where this OK comes from...)

brhmrsln commented 3 months ago

The shared log is showing the first nmea request we sent and its response. When I checked the logs, there is no fail or timeout situation:) Actually OK is in the response. I mean when we request NMEA sentence, module is sending the response as expected (nmea sentence + 'OK'). And we see this in the uart buffer. but when we read DTE buffer, somehow we can only read the sentence but not the OK. It is still in the uart buffer however. Next time when we request, the response is like starting with 'OK'.

With the solution 'flushing the rx buffer', my concern is that we can also flush the internet related data because we are also using the gsm module at the same time and its data is also coming from same uart. (One uart connection to the module and CMUX configuration) A workaround solution I am implementing for now is sending sync command('AT') after the nmea request just to read the buffer and get rid of the 'OK'. It seems it is working now but not sure what would be the side effects. Thank you for the help, hope we can find a better solution :)

david-cermak commented 3 months ago

A workaround solution I am implementing for now is sending sync command('AT')

This should work, I don't see any downside other than the effective response time. Would it wok if you send the sync command just once, upon entering the CMUX mode?

One theory for the extra OK would be that it's a leftover from the initial AT+CMUX=0\r command. Could you please check if applying this solves the problem?

diff --git a/components/esp_modem/src/esp_modem_command_library.cpp b/components/esp_modem/src/esp_modem_command_library.cpp
index 6f06bb7..e491b3f 100644
--- a/components/esp_modem/src/esp_modem_command_library.cpp
+++ b/components/esp_modem/src/esp_modem_command_library.cpp
@@ -373,7 +373,7 @@ command_result send_sms(CommandableIf *t, const std::string &number, const std::
 command_result set_cmux(CommandableIf *t)
 {
     ESP_LOGV(TAG, "%s", __func__ );
-    return generic_command_common(t, "AT+CMUX=0\r");
+    return generic_command_common(t, "AT+CMUX=0\r", 5000);
 }

 command_result read_pin(CommandableIf *t, bool &pin_ok)
diff --git a/components/esp_modem/src/esp_modem_dce.cpp b/components/esp_modem/src/esp_modem_dce.cpp
index 55c51f0..cce4684 100644
--- a/components/esp_modem/src/esp_modem_dce.cpp
+++ b/components/esp_modem/src/esp_modem_dce.cpp
@@ -135,7 +135,10 @@ bool DCE_Mode::set_unsafe(DTE *dte, ModuleIf *device, Netif &netif, modem_mode m
         if (mode == modem_mode::DATA_MODE || mode == modem_mode::CMUX_MODE || mode >= modem_mode::CMUX_MANUAL_MODE) {
             return false;
         }
-        device->set_mode(modem_mode::CMUX_MODE);    // switch the device into CMUX mode
+        if (!device->set_mode(modem_mode::CMUX_MODE)) {
+            return false;
+        }
+                                                    // switch the device into CMUX mode
         usleep(100'000);                            // some devices need a few ms to switch

         if (!dte->set_mode(modem_mode::CMUX_MODE)) {
brhmrsln commented 3 months ago

"Would it wok if you send the sync command just once, upon entering the CMUX mode?" -> Does not work. I have to send after every NMEA response we received.

I apply the patch but still have the issue.

I think the problem lies in filling the dte buffer. I mean after we send NMEA request, we are waiting for the response (sentence + OK). This answer is coming from the MC60 as expected. And when I check the uart buffer (uart_read_bytes) I see the response is as we expected. But when I get the response with dce_commands::generic_get_string() or dce_commands::at_raw() I can only read the 'sentence'. The remaining response('OK') is in the uart buffer and will be read after a request.

We are expecting dte buffer to be this (with OK at the end, nothing left in the uart buffer and response string is sentence + OK) 1secDelay

But, dte buffer has only sentence and the expected OK is still in the uart buffer waiting to be read(it should be read and should be in the dte buffer. Below logs are sequential):

in_at_raw_buffer_log in_at_raw_remaining_uart_buffer_log

The below log i get with uart_read_bytes function of esp: in_at_raw_remaining_uart_buffer_log

I think the problem is not having an extra 'OK' but not be able to get the whole response from uart buffer to dte buffer. If we get the whole response, both dce_commands::generic_get_string() and dce_commands::at_raw() is going to work as we expected.

Thank you

david-cermak commented 3 months ago

I see, thanks for testing. the issue is apparently somewhere else.

Could you please share your sdkconfig (I'm only interested in esp_modem options that are set differently than the defaults). From what I can see, you had ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD set to n, correct? Otherwise you wouldn't get the line processing lambda called with only the partial response (as I saw that the response itself and the OK message come within one CMUX payload).

If it's the case, could you please try to enable ESP_MODEM_USE_INFLATABLE_BUFFER_IF_NEEDED ? (still doens't explain why it works on 80MHz, when the response apparently comes in two pieces...)

brhmrsln commented 3 months ago

"From what I can see, you had ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD set to n, correct?", it is set to 'y'

Here is the related part.

#
# esp-modem
#
CONFIG_ESP_MODEM_CMUX_DEFRAGMENT_PAYLOAD=y
CONFIG_ESP_MODEM_USE_INFLATABLE_BUFFER_IF_NEEDED=y
CONFIG_ESP_MODEM_CMUX_DELAY_AFTER_DLCI_SETUP=1000
CONFIG_ESP_MODEM_CMUX_USE_SHORT_PAYLOADS_ONLY=y
# CONFIG_ESP_MODEM_ADD_CUSTOM_MODULE is not set
# end of esp-modem

And i used various settings :)

When I saw the issue , I thought that the response is coming in two pieces too. But I think it is not the case because in logic analyzer there is no delay between 'Sentence' and 'OK' and uart buffer has the "OK" when we read the 'sentence' from dte buffer.

Here is how I got the log:

command_result at_raw(CommandableIf *t, const std::string &cmd, std::string &out, const std::string &pass, const std::string &fail, int timeout= 500)
  {
    ESP_LOGV(TAG, "%s", __func__);
    return t->command(
        cmd,
        [&](uint8_t *data, size_t len)
        {
          out.assign(reinterpret_cast<char *>(data), len);

          // LOG *************************************************************
          //  Only Sentece is read, I also read more than length still no 'OK'
          ESP_LOG_BUFFER_HEXDUMP(TAG, data, len, ESP_LOG_INFO);

          // When we read uart buffer here, OK is there
          uint8_t uart_remaining_data[128];
          size_t uart_data_len= uart_read_bytes(1, uart_remaining_data, 128, 20);
          if (uart_data_len > 0)
          {
            uart_remaining_data[uart_data_len]= 0;
            ESP_LOGI(TAG, "Remaining data: %s", data);
          }
          // LOG *************************************************************

          if (out.find(pass) != std::string::npos)
          {
            return command_result::OK;
          }
          else if (out.find(fail) != std::string::npos)
          {
            return command_result::FAIL;
          }

          return command_result::TIMEOUT;
        },
        timeout);
  }

Thank you

david-cermak commented 3 months ago

Okay, seems I was wrong again. The response may or may not come in two pieces, but it comes in two distinct cmux messages, now I see it from your logic analyzer picture (was confused by the logs seeing only ascii characters). What I think is happening on slower CPU rate is that it actually comes in one piece, but the CMUX layers see it as two payloads, so the lambda is called twice, this also explains your initial log on faster clock rate, you might get a callback immediately after the first part (without the OK), which would also explain why you're seeing some UART data buffered. But wouldn't explain why you won't get the second part of the response. I'll try to simulate this condition on my device.

I think you've already tried, but still:

brhmrsln commented 3 months ago

1)"what happens if you remove reading of the UART buffer from the code above? if we read the buffer there then the data wouldn't be available in the second callback" -> If I read the uart buffer there, then I can get the 'sentence' because I take the 'OK' from the uart buffer manually by reading it and then when I make a new NMEA request, I again get the 'sentence'(response str is filled with 'sentence') and manually get the 'OK'(with uart_read_bytes) and so on. We are manually removing 'OK' by reading the uart buffer. -> If I remove it, then I get the first response (just sentence, 'OK' is still in uart buffer). After that we request new NMEA and we only get the 'OK'(from the response of the previous request). 2) increase the timeout of the command to maximum, try 10 seconds or so, just to be sure. Does the command return command_result::TIMEOUT ? -> I try it with 5 seconds and it returns command_result::TIMEOUT.

"I'll try to simulate this condition on my device." Remainder: I am using ESP32-S3 (with 240Mhz system clock config) and Quectel MC60 (All in one solution with CMUX config). I connected them via one UART (In command mode no issue). After I make the CMUX configuration and get the ip, I am periodically requesting NMEA sentence(periods can be various.) Also this can be also very helpful. 1 sec delay at the beggining of the DTE::command_cb::process_line is exactly what we expected(dte buffer contains 'sentence+OK'): image

I appriciate your time and the help, Thank you David.

david-cermak commented 3 months ago

It seems like an issue in the modem code (which doesn't exhibit very often, as I've never seen two such a small fragments posted in two cmux frames, which is a legitimate usecase). The question is why we won't get the second callback. would you mind trying this again and logging the data event directly in the uart task?

--- a/components/esp_modem/src/esp_modem_uart.cpp
+++ b/components/esp_modem/src/esp_modem_uart.cpp
@@ -120,6 +120,7 @@ void UartTerminal::task()
             switch (event.type) {
             case UART_DATA:
                 uart_get_buffered_data_len(uart.port, &len);
+                ESP_LOGW(TAG, "UART_DATA %d", len);
                 if (len && on_read) {
                     on_read(nullptr, len);
                 }

1 sec delay at the beggining of the DTE::command_cb::process_line is exactly what we expected(dte buffer contains 'sentence+OK'):

This actually explains that we get the correct data in two cmux callbacks, but only one uart callback (but not via two uart callbacks).

If this is the case, then this should fix the problem:

--- a/components/esp_modem/src/esp_modem_uart.cpp
+++ b/components/esp_modem/src/esp_modem_uart.cpp
@@ -160,6 +160,11 @@ void UartTerminal::task()
                 ESP_LOGW(TAG, "unknown uart event type: %d", event.type);
                 break;
             }
+        } else {
+            uart_get_buffered_data_len(uart.port, &len);
+            if (len && on_read) {
+                on_read(nullptr, len);
+            }
         }
     }
 }

(which basically just periodically checks for any potential data buffered in case we missed an event)

brhmrsln commented 3 months ago

Sorry for the late response, I was running some functional tests on the solution. it is working, thank you. The functional tests I carried out also include sending data over the Internet to server while periodically requesting the NMEA sentence and no problem is observed.

The log you requested (regarding the size of the uart buffer) UartBufferLength

About the solution, could you please clarify if there's a planned timeline for these changes to be merged into the master branch? Because I want to use this component from Github (as managed_components) to be always up to date with the repository. I only manage the component manually for test and logging purposes.

Again I appriciate the help, Thank you.

david-cermak commented 3 months ago

Thanks for testing it and for the good news. I've just put up a PR with the fix. It usually takes few days for people to review it before it gets merged. I'll also add a bump commit, so we'll publish a new version to component manager.