espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
165 stars 115 forks source link

creating DCE fails with null pointer(error 259) (IDFGH-7764) #74

Closed VasilNikolovRilabs closed 1 year ago

VasilNikolovRilabs commented 1 year ago

Hello,

we faced the following issue when creating DCE for SIM7070

E (11360) modem_api: create_modem_dce: Exception caught with ESP err_code=259
E (11360) modem_api:
I (11360) ESPmodem_Cmux: setting flow_control
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4201fde3  PS      : 0x00060f30  A0      : 0x8201f1fd  A1      : 0x3fc9d980
0x4201fde3: std::__shared_ptr<esp_modem::GenericModule, (__gnu_cxx::_Lock_policy)2>::get() const at c:\espressif\tools\xtensa-esp32s3-elf\esp-2022r1-rc1-11.2.0\xtensa-esp32s3-elf\xtensa-esp32s3-elf\include\c++\11.2.0\bits/shared_ptr_base.h:1296 (discriminator 14)
 (inlined by) std::__shared_ptr_access<esp_modem::GenericModule, (__gnu_cxx::_Lock_policy)2, false, false>::_M_get() const at c:\espressif\tools\xtensa-esp32s3-elf\esp-2022r1-rc1-11.2.0\xtensa-esp32s3-elf\xtensa-esp32s3-elf\include\c++\11.2.0\bits/shared_ptr_base.h:993 (discriminator 14)
 (inlined by) std::__shared_ptr_access<esp_modem::GenericModule, (__gnu_cxx::_Lock_policy)2, false, false>::operator->() const at c:\espressif\tools\xtensa-esp32s3-elf\esp-2022r1-rc1-11.2.0\xtensa-esp32s3-elf\xtensa-esp32s3-elf\include\c++\11.2.0\bits/shared_ptr_base.h:987 (discriminator 14)
 (inlined by) esp_modem::command_result esp_modem::DCE::set_flow_control<int, int>(int&&, int&&) at C:/Rilabs/Krone/Sources/KSC2N/GitLab/KSC2N-APPL/ksc-2n/src/SERVICES/EspModem_Driver/include/cxx_include/esp_modem_dce.hpp:117 (discriminator 14)
 (inlined by) ModemManager_Configuration at C:/Rilabs/Krone/Sources/KSC2N/GitLab/KSC2N-APPL/ksc-2n/src/SERVICES/EspModem_Manager/ModemManager_Wrapper.cpp:326 (discriminator 14)

A2      : 0x00000000  A3      : 0x00000001  A4      : 0x00000001  A5      : 0x00000000
A6      : 0x0000019a  A7      : 0x00000000  A8      : 0x8201fdd4  A9      : 0x3fc9d930
A10     : 0x00000003  A11     : 0x3c11064c  A12     : 0x3c110ba8  A13     : 0x00002c60
A14     : 0x3c11064c  A15     : 0x00000047  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000008  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffc

Backtrace:0x4201fde0:0x3fc9d9800x4201f1fa:0x3fc9daf0 0x420ec933:0x3fc9db10 0x4200a97d:0x3fc9db30 0x40380a59:0x3fc9db50
0x4201fde0: std::__uniq_ptr_impl<esp_modem::DCE, std::default_delete<esp_modem::DCE> >::_M_ptr() const at c:\espressif\tools\xtensa-esp32s3-elf\esp-2022r1-rc1-11.2.0\xtensa-esp32s3-elf\xtensa-esp32s3-elf\include\c++\11.2.0\bits/unique_ptr.h:173 (discriminator 14)
 (inlined by) std::unique_ptr<esp_modem::DCE, std::default_delete<esp_modem::DCE> >::get() const at c:\espressif\tools\xtensa-esp32s3-elf\esp-2022r1-rc1-11.2.0\xtensa-esp32s3-elf\xtensa-esp32s3-elf\include\c++\11.2.0\bits/unique_ptr.h:422 (discriminator 14)
 (inlined by) std::unique_ptr<esp_modem::DCE, std::default_delete<esp_modem::DCE> >::operator->() const at c:\espressif\tools\xtensa-esp32s3-elf\esp-2022r1-rc1-11.2.0\xtensa-esp32s3-elf\xtensa-esp32s3-elf\include\c++\11.2.0\bits/unique_ptr.h:416 (discriminator 14)
 (inlined by) ModemManager_Configuration at C:/Rilabs/Krone/Sources/KSC2N/GitLab/KSC2N-APPL/ksc-2n/src/SERVICES/EspModem_Manager/ModemManager_Wrapper.cpp:326 (discriminator 14)

0x4201f1fa: ModemManager_MainFunction at C:/Rilabs/Krone/Sources/KSC2N/GitLab/KSC2N-APPL/ksc-2n/src/SERVICES/EspModem_Manager/ModemManager.c:96

0x420ec933: osw::Event::run() const at C:/Rilabs/Krone/Sources/KSC2N/GitLab/KSC2N-APPL/ksc-2n/src/OS/os_wrapper/src/Event.cpp:27

0x4200a97d: osw::Task<0u>::run(void*) at C:/Rilabs/Krone/Sources/KSC2N/GitLab/KSC2N-APPL/ksc-2n/src/OS/os_wrapper/include/Task.hpp:173

0x40380a59: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-master_5_0/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:141
franz-ms-muc commented 1 year ago

it seems the "LoadProhibited" occuring at "dce->set_flow_control" is a follow-up-error.

the main Error is that dce is nullptr.

so this is the core Problem: E (11360) modem_api: create_modem_dce: Exception caught with ESP err_code=259 E (11360) modem_api:

we have debugged all stuff, (maybe you can participate in a Debug session ?)

and found out that the Error might be after this Line: https://github.com/espressif/esp-protocols/blob/master/components/esp_modem/include/cxx_include/esp_modem_dce_factory.hpp#L104

because the 2 Lines with "return nullptr" were not be executed.

diplfranzhoepfinger commented 1 year ago

err_code=259 seems to be a invalid state Error

diplfranzhoepfinger commented 1 year ago

ESP_ERR_INVALID_STATE

VasilNikolovRilabs commented 1 year ago

Hi @david-cermak

Here are the verbose outputs:

I (12568) ESPmodem_Cmux: config MODEM
D (12578) esp_netif_lwip: LwIP stack has been initialized
D (12578) esp_netif_lwip: esp-netif has been successfully initialized
I (12588) uart: queue free spaces: 30
V (12588) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): checking args
V (12598) intr_alloc: esp_intr_alloc_intrstatus (cpu 0): Args okay. Resulting flags 0xE
D (12608) intr_alloc: Connected src 28 to int 18 (cpu 0)
I (12608) ESPmodem_Cmux: config NETWORK
D (12618) esp_netif_objects: esp_netif_add_to_list 0x3fce8680
D (12628) esp_netif_objects: esp_netif_add_to_list netif added successfully (total netifs: 1)
D (12638) esp-netif_lwip-ppp: esp_netif_new_ppp: PPP connection created: 0x3fce898c
D (12638) esp-netif_lwip-ppp: Phase Dead
E (12648) modem_api: create_modem_dce: Exception caught with ESP err_code=259 ESP_ERR_INVALID_STATE
E (12648) modem_api:

It sems that D (12638) esp-netif_lwip-ppp: Phase Dead is causing a problem

david-cermak commented 1 year ago

Hi @VasilNikolovRilabs

It seems like that the suspect is this line:

https://github.com/espressif/esp-protocols/blob/ef1bae5cdd65f10a6739c1b9b0aa3f0283f13ffa/components/esp_modem/src/esp_modem_dte.cpp#L57-L60

Which really indicates an invalid state, as the signal was triggered and at the same time the command result wasn't updated (but we set the signal only if the result is updated:

https://github.com/espressif/esp-protocols/blob/ef1bae5cdd65f10a6739c1b9b0aa3f0283f13ffa/components/esp_modem/src/esp_modem_dte.cpp#L48-L49

)

Moreover, no command should be issued during DCE creation, as far as I know.

This issue might have something to do with the ESP_MODEM_FLOW_CONTROL_HW configuration that you use, and possibly a command for the modem to use the HW flow control (e.g. AT+IFC=2,2 or similar). Could you please share more details about setting the mode?

VasilNikolovRilabs commented 1 year ago

Hello @david-cermak

thank you for the information.

I have 3 further questions:

  1. What does the term "signal" means - and why getting the signal times out?
  2. How can we know that the creation of the DCE is finished, does it mean that?
  3. We start the flow control just after the DCE and the DTE were created in our code. The order is as following: 3.1 We create DTE instance

          dte_configuration.uart_config.tx_io_num = 17;
          dte_configuration.uart_config.rx_io_num = 18;
          dte_configuration.uart_config.rts_io_num = 14;
          dte_configuration.uart_config.cts_io_num = 13;
          dte_configuration.uart_config.flow_control = ESP_MODEM_FLOW_CONTROL_HW;
    
         dte = create_uart_dte(&dte_configuration);

    3.2 We create NetIf instance

    /* Configure the PPP netif */
     esp_netif_config_t netif_ppp_config = ESP_NETIF_DEFAULT_PPP();
    
    /* Create the PPP and DCE objects */
    esp_netif = esp_netif_new(&netif_ppp_config);

    3.3 We create DCE using the DTE and NetIf instances and a standart DCE config

     /* Configure the DCE */
     esp_modem_dce_config_t dce_config = ESP_MODEM_DCE_DEFAULT_CONFIG("nbiot.global-m2m.net");
     dce = create_SIM7070_dce(&dce_config, dte, esp_netif);

    3.4 At the end we start the flow control

      if (command_result::OK != dce->set_flow_control(2, 2)) {
            ESP_LOGE(TAG, "Failed to set the set_flow_control mode");
            return; //TODO: error reaction must be defined

It seems that we fail on 3.3 to create the DCE successfully and as a result there is a address violation when setting the flow control using the DCE in step 3.4.

If you want we can make one session online and I can share our code base with you. Think that 30 minutes can be enough.

david-cermak commented 1 year ago

What does the term "signal" means -

The signal from the code snippet above is just a single event group indicating reception of AT reply.

and why getting the signal times out?

That's the weird part. It hasn't timed out, it was set, yet the value was not updated accordingly. The only reason I could think of is a memory corruption.

How can we know that the creation of the DCE is finished, does it mean that?

The creation is completed once create_XXX_dce() returns, and the result is a valid non-nullptr object.

We start the flow control just after the DCE and the DTE were created in our code. The order is as following:

Could you please make sure that before 3.4, you have a valid pointer. I somehow understood from the above that the creation itself failed.

VasilNikolovRilabs commented 1 year ago

Hello @david-cermak

I did not fully understand this sentence: "That's the weird part. It hasn't timed out, it was set, yet the value was not updated accordingly. The only reason I could think of is a memory corruption."

Yes, in step 3.3 at the time when we create the DCE we dont receive a valid pointer for DCE which leads to the problem in step 3.4.

My questions is: What could really lead to this problem, could this happen if the modem is not physically switched on(with the power key)? Which AT command is issued exactly to the SIMCOM modem while creating the DCE and could this be the reason for the timeout if the modem is not in the proper state while switching it on?

david-cermak commented 1 year ago

Sorry, just wanted to say, that this error should never happen, so the only reason I could think about is that the member variables get corrupted, i.e. rewritten by another (unrelated) code or program.

What could really lead to this problem, could this happen if the modem is not physically switched on(with the power key)?

No, the objects should be easily created even if there's no modem.

Which AT command is issued exactly to the SIMCOM modem while creating the DCE

No command is sent during the creation of any DCE.

diplfranzhoepfinger commented 1 year ago

Am Mo., 11. Juli 2022 um 14:00 Uhr schrieb david-cermak < @.***>:

What does the term "signal" means -

The signal from the code snippet above is just a single event group indicating reception of AT reply.

and why getting the signal times out?

That's the weird part. It hasn't timed out, it was set, yet the value was not updated accordingly. The only reason I could think of is a memory corruption.

How can we know that the creation of the DCE is finished, does it mean that?

The creation is completed once create_XXX_dce() returns, and the result is a valid non-nullptr object.

but we have a null pointer. that is sure.

We start the flow control just after the DCE and the DTE were created in our code. The order is as following:

Could you please make sure that before 3.4, you have a valid pointer. I somehow understood from the above that the creation itself failed.

no need, we get a nullpointer.

— Reply to this email directly, view it on GitHub https://github.com/espressif/esp-protocols/issues/74#issuecomment-1180322675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJAHMXKXRZFCRDAR4WWQZSTVTQEGTANCNFSM522QKBSA . You are receiving this because you commented.Message ID: @.***>

-- Diplom-Informatiker Franz Höpfinger e.K. Reibersdorf 16a 84419 Schwindegg

david-cermak commented 1 year ago

Added https://github.com/espressif/esp-protocols/pull/88 to print more info if exceptions are enabled and caught internally, for example:

D (1533) esp_netif_objects: esp_netif_add_to_list 0x3ffbe634
D (1533) esp_netif_objects: esp_netif_add_to_list netif added successfully (total netifs: 1)
D (1543) esp-netif_lwip-ppp: esp_netif_new_ppp: PPP connection created: 0x3ffbe940
D (1553) esp-netif_lwip-ppp: Phase Dead
E (1573) modem_api: create_modem_dce: Exception caught with ESP err_code=259
E (1573) modem_api: esp_modem_netif.cpp:85 ERROR

assert failed: void app_main() simple_cmux_client_main.cpp:78 (dce)
diplfranzhoepfinger commented 1 year ago

our code is very much the "simple_cmux_client",

and the Exception is raised at this line:

dce = create_SIM7070_dce(&dce_config, dte, esp_netif);

later the call to dce->set_flow_control(2, 2) fails, because dce is a Nullpointer.

but this a a follow-up Failure.

so, the Question is what is the "invalid state" on create dce ?

Thanks, Franz

Am Mo., 11. Juli 2022 um 09:34 Uhr schrieb david-cermak < @.***>:

Hi @VasilNikolovRilabs https://github.com/VasilNikolovRilabs

It seems like that the suspect is this line:

https://github.com/espressif/esp-protocols/blob/ef1bae5cdd65f10a6739c1b9b0aa3f0283f13ffa/components/esp_modem/src/esp_modem_dte.cpp#L57-L60

Which really indicates an invalid state, as the signal was triggered and at the same time the command result wasn't updated (but we set the signal only if the result is updated:

https://github.com/espressif/esp-protocols/blob/ef1bae5cdd65f10a6739c1b9b0aa3f0283f13ffa/components/esp_modem/src/esp_modem_dte.cpp#L48-L49

)

Moreover, no command should be issued during DCE creation, as far as I know.

This issue might have something to do with the ESP_MODEM_FLOW_CONTROL_HW configuration that you use, and possibly a command for the modem to use the HW flow control (e.g. AT+IFC=2,2 or similar). Could you please share more details about setting the mode?

— Reply to this email directly, view it on GitHub https://github.com/espressif/esp-protocols/issues/74#issuecomment-1180064497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJAHMXKUGJ5UZUC6G4VNO7DVTPFBVANCNFSM522QKBSA . You are receiving this because you commented.Message ID: @.***>

-- Diplom-Informatiker Franz Höpfinger e.K. Reibersdorf 16a 84419 Schwindegg