Closed zvasicek56 closed 3 years ago
Hi @zvasicek56
How often do you see this happening, does it only happen in the COMMAND_MODE
?
I've quickly checked the commit you referenced and it works as expected. Getting only UART_PATTERN_DET
events in the command mode, since the rx events (UART_DATA
events) should be disabled:
It might be possible, however, that we could get these two events in the transition mode, but again in the transition DATA_MODE
-> COMMAND_MODE
, the pattern detection is disabled, so we should get only data events. Perhaps if some older event was triggered before the transition and processed after it, but doesn't look like this from your log, as you're getting multiple different events.
Hi @david-cermak
thank you for your response. The issue is observed in the COMMAND_MODE
. According to your description I traced the root cause of that behavior. It is caused by the function uart_flush
which is called on line 242. This part of the code is activated if the modem is powering up and emits RDY lines. But this is not the only place where uart_flush
or uart_flush_input
is called in the esp_modem.c
code so there may be different situations when the observed issue happens.
According to the source codes, the function uart_flush
is an alias for uart_flush_input
. The implementation of uart_flush_input
is probably buggy as its execution causes side effects that are not mentioned in the documentation. The function internally calls uart_enable_rx_intr
independently on the previous state of the interrupt mask.
It looks that also uart_read_bytes
called within esp_handle_uart_pattern
can activate rx interrupt as a side effect of uart_check_buf_full
call.
@zvasicek56 Thanks for the investigation and sharing more details. I still wasn't able to reproduce the issue, but I think I understand the potential cause of the trouble.
It might be possible, that the pattern detection in the uart driver generates more events before the modem thread gets to receive them and thus reading data behind the currently processing event, causing the potential corruption you described above. If he assumption is correct, then this WIP fix below should help: PPPoS-Client-Fix-race-pattern-event-race-condition.patch.txt
Could you please apply the patch and see if this fixes the issue?
@david-cermak, the patch actually does not address the root of the problem. Data loss still persists and moreover, the patch leads to new errors: a) partial line is given to handle_line
, b) "uart read bytes failed" error occurs. This is because the value of event.size
is invalid at the time of the execution of the corresponding event handler.
The issue is that both UART and PATTERN interrupts are enabled at the same time as a consequence of the uart_flush
call. This causes duplicate events that interfere together.
Please have look at the latest log after patch. The first issue is here: Event I (18244) UART: [UART_PATTERN_DET DATA] ...
should be ignored, there is no 0a
character in the buffer. This character was already processed in the previous handler, i.e. I (18114) UART: [UART DATA] ...
. The previous implementation (before patching) could handle that because the handlers ignored the value of event.size
.
Could you please comment the first solution proposed in the initial submission. Is it safe to just ignore all the UART DATA events if the modem driver is in command mode? Anyway, UART driver needs to be patched to fix uart_flush
which enables interrupt even if it was disabled before.
E (17004) bg96: bg96_init(185): sync failed D (17594) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17594) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17594) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17604) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17614) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17614) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17624) esp-modem: Ignore UART event 7 for DTE with no DCE attached D (17634) esp-modem: Ignore UART event 7 for DTE with no DCE attached I (18004) pppos_example: Init modem I (18004) UART: [UART_PATTERN_DET DATA]: event.size: 2 ; modem mode: 0 D (18014) esp-modem: handle_line : 0x3ffba458 0d 0a |..|
I (18024) UART: [UART_PATTERN_DET DATA]: event.size: 4 ; modem mode: 0 D (18044) esp-modem: handle_line : 0x3ffba458 4f 4b 0d 0a |OK..| D (18054) dce_service: sync ok
I (18054) UART: [UART_PATTERN_DET DATA]: event.size: 2 ; modem mode: 0 D (18074) esp-modem: handle_line : 0x3ffba458 0d 0a |..|
I (18084) UART: [UART_PATTERN_DET DATA]: event.size: 4 ; modem mode: 0 D (18104) esp-modem: handle_line : 0x3ffba458 4f 4b 0d 0a |OK..| D (18114) dce_service: disable echo ok
I (18114) UART: [UART DATA]: event.size: 1 ; modem mode: 0 V (18124) esp-modem:: esp_handle_uart_data, length :1 V (18134) esp-modem: debug_data: uart_pattern_get_pos :-1 V (18224) esp-modem: debug_data: Continuous read in non-data mode : length: 2 char: 0xa D (18234) esp-modem: handle_line : 0x3ffba458 0d 0a |..|
I (18244) UART: [UART_PATTERN_DET DATA]: event.size: 1 ; modem mode: 0 D (18244) esp-modem esp_handle_uart_pattern: debug_data: 0x3ffba458 30 |0| D (18254) esp-modem: handle_line : 0x3ffba458 30 |0|
I (18264) UART: [UART DATA]: event.size: 15 ; modem mode: 0 V (18274) esp-modem:: esp_handle_uart_data, length :14 V (18294) esp-modem: debug_data: uart_pattern_get_pos :-1 V (18334) esp-modem: debug_data: Continuous read in non-data mode : length: 15 char: 0xa D (18344) esp-modem: handle_line : 0x3ffba458 47 20 44 75 6d 6d 79 20 4d 6f 64 65 6c 0d 0a |G Dummy Model..|
I (18354) UART: [UART_PATTERN_DET DATA]: event.size: 1 ; modem mode: 0 D (18374) esp-modem: handle_line : 0x3ffba458 4f |O|
I (18384) UART: [UART DATA]: event.size: 2 ; modem mode: 0 V (18384) esp-modem:: esp_handle_uart_data, length :1 V (18404) esp-modem: debug_data: uart_pattern_get_pos :-1 V (18444) esp-modem: debug_data: Continuous read in non-data mode : length: 2 char: 0xd V (18444) esp-modem: debug_data: Continuous read in non-data mode : length: 3 char: 0xa D (18454) esp-modem: handle_line : 0x3ffba458 4b 0d 0a |K..|
I (18464) UART: [UART_PATTERN_DET DATA]: event.size: 2 ; modem mode: 0 E (18574) esp-modem: uart read bytes failed E (18614) esp-modem: esp_modem_dte_send_cmd(322): process command timeout E (18614) dce_service: esp_modem_dce_get_module_name(311): send command failed E (18614) bg96: bg96_init(189): get module name failed I (19624) pppos_example: Init modem
@zvasicek56 Thanks for testing the patch. It looks like my assumption was wrong. In this case, I'm afraid, I'm running out of ideas. I've been testing this exact scenario quite extensively, simulating both events at the same time, but no luck in understanding what was happening in your case. Even if I made both interrupts enabled and both events fired, the data got processed correctly. Are you running only one client or multiple? I this happening after startup, or have you been switching between modes?
Could you please comment the first solution proposed in the initial submission.
I don't like the idea of skipping the data processing based on the operational mode. Both handlers must work in both modes, as they seem to work in my case. Another reason is support for the transition mode, where we could receive both data and commands at once. Have seen issues with processing commands (already in the command mode), just after switching from PPP mode. therefore the data event has to work correctly in a plain command mode.
Have you tried disabling the pattern detection interrupts altogether? Could you please check once? This should work the same way.
Hi @zvasicek56
I wasn't able to reproduce the issue, but I believe there is a potential problem of these two events competing and causing data races.
I was thinking of processing the pattern detection manually for some time already (I've done so in the esp_modem
component to easily switch the underlying DTE: uart -> usb...), but haven't really gotten to use only the UART_DATA
event, for both PPP data and AT commands.
Here a WIP patch removing the uart pattern detection. Could you please check
Examples-pppos_client-Remove-uart-pattern-detection.patch.txt
if this solves the problems you report? Thanks.
Thanks for reporting, would you please help share if any updates for the issue? Thanks.
Thank you very much @david-cermak. I can confirm that your patch, which only uses the uart_data
event, works correctly, at least for SIM7600. The issue with two competing events has been partly solved by fixing uart_flush_input
, but your solution is safer.
@zvasicek56 Thanks for the feedback! I also think this is a good change. I will clean the patch up and get it merged. This issue is to be closed once merged and synced.
@david-cermak
Just curious why do you use a union keep the "line_buffer_size" since it's no longer used? https://github.com/espressif/esp-idf/blob/master/examples/protocols/pppos_client/components/modem/include/esp_modem.h#L62-L65
@AxelLin
why do you use a union keep the "line_buffer_size" since it's no longer used?
Not used, but still a kind of public API, so wanted to keep it compatible so that both the new option and the old one is accepted.
@david-cermak
I think I hit a bug with the change in a90817c.
In esp_handle_uart_data() esp_dte_handle_line()
I check the uart data (It's actually multiple lines): I got below, the "+COPS: 0" is unexpected data so esp_dte_handle_line() return error. +COPS: 0
OK
However, it then stop working.
Remove the MODEM_CHECK from this line, then it works. https://github.com/espressif/esp-idf/blob/master/examples/protocols/pppos_client/components/modem/src/esp_modem.c#L124
Looks like the MODEM_CHECK causes the second line "OK" never received.
Thanks for the early feedback, I'll look into it and fix it (I've tested this multiline scenario, expecting that the line handlers return ESP_OK
to indicate also read more, but ESP_FAIL
to stop parsing immediately. Probably missed the fact that this unexpected error
is reported but reading/parsing might still continue)
Hey! I'm sorry to rise up this issue, but can you tell me in which version of esp-idf this issue has been resolved?
I suppose it would be enough to use
platform = https://github.com/platformio/platform-espressif32.git
right?
Thank you!
Environment
git describe --tags
to find it): // v4.4-dev-1594-g1d7068e4bProblem Description
There is a bug in the recent version of
examples\protocols\pppos_client\components\modem\src\esp_modem.c
introduced probably in 55d41. The data from the UART's RX buffer are consumed in two events: UART_DATA and UART_PATTERN_DET. This causes duplicate events which manifests in data corruption. Data received in UART_DATA event handler can be overwritten in UART_PATTERN_DET handler under some circumstances. The issue is that both handlers wrongly supposes the esp_dte->buffer is empty despite of the fact that it can contain data written in a previous event.Debug Log
This issue is hard to debug, so please see the isolated sequence of events (bold), states and actions causing the data corruption:
Data (two characters 'OK' added in the fourth event) are overwritten in the last event.
Solution
It is not clear why the data are processed in UART_DATA handler in command mode.