ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.67k stars 2.98k forks source link

C030 SARA-R412M: URC already added with prefix. Cellular state machine failure. #15143

Open phlegx opened 3 years ago

phlegx commented 3 years ago

Description of defect

Normally the cellular connection works without problems. We noticed that a reboot with external power supply of the device after the battery (without external power) went dead, the Ublox SIM chip is in a bad state and it shows this warnings:

[00226631ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00226648ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00226665ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00226693ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D

After many retries of AT TX ( 9): AT+CFUN=4 we get the errors:

[00367436ms][ERR ][CELL]: CellularStateMachine failure: Device ready
[00367453ms][ERR ][CELL]: CellularStateMachine target state Attaching network, current state Device ready
[00367494ms][INFO][CELL]: CellularContext connect
[00367509ms][INFO][CELL]: CellularContext connect

and the connection procedure don't continue. Our code is simple:

...
iface = CellularContext::get_default_instance();
iface->attach(callback(this, &Cellular::cbCellular));
iface->set_default_parameters();

nsapi_error_t retcode = NSAPI_ERROR_OK;
retcode = iface->connect();
...
Questions
  1. How can we capture the CellularStateMachine error?
  2. What should we set/call/config to avoid this problem?
  3. Any idea about a quick workaround? It's urgend.
  4. How can we send AT commands from iface (CellularContext) to try some commands?

Target(s) affected by this defect ?

C030-R412M

Toolchain(s) (name and version) displaying this defect ?

Mbed Studio 1.4.3

What version of Mbed-os are you using (tag or sha) ?

mbed-os-5.15.7

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

Mbed Studio 1.4.3

How is this defect reproduced ?

[00221485ms][INFO][CELL]: New CellularContext  (2000dc10)
[00221517ms][INFO][CELL]: CellularContext connect
[00221543ms][INFO][CELL]: Start connecting (timeout 1000 ms)
[00221559ms][INFO][CELL]: AT TX ( 2): AT
[00221574ms][INFO][CELL]: AT TX ( 1): <cr>
[00222568ms][INFO][CELL]: AT TX ( 2): AT
[00222572ms][INFO][CELL]: AT TX ( 1): <cr>
[00223577ms][INFO][CELL]: Modem power ON (timeout 1000 ms)
[00223593ms][INFO][CELL]: AT TX ( 2): AT
[00223608ms][INFO][CELL]: AT TX ( 1): <cr>
[00224601ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00224616ms][INFO][CELL]: AT TX ( 1): <cr>
[00225610ms][INFO][CELL]: Continue after 1 seconds
[00226631ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00226648ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00226665ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00226693ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00226721ms][INFO][CELL]: AT TX ( 2): AT
[00226725ms][INFO][CELL]: AT TX ( 1): <cr>
[00227719ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00227734ms][INFO][CELL]: AT TX ( 1): <cr>
[00228728ms][INFO][CELL]: Continue after 2 seconds
[00230749ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00230766ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00230783ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00230811ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00230839ms][INFO][CELL]: AT TX ( 2): AT
[00230843ms][INFO][CELL]: AT TX ( 1): <cr>
[00231837ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00231852ms][INFO][CELL]: AT TX ( 1): <cr>
[00232846ms][INFO][CELL]: Continue after 4 seconds
[00236867ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00236884ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00236901ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00236929ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00236957ms][INFO][CELL]: AT TX ( 2): AT
[00236961ms][INFO][CELL]: AT TX ( 1): <cr>
[00237955ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00237970ms][INFO][CELL]: AT TX ( 1): <cr>
[00238964ms][INFO][CELL]: Continue after 8 seconds
[00246985ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00247002ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00247019ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00247047ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00247075ms][INFO][CELL]: AT TX ( 2): AT
[00247079ms][INFO][CELL]: AT TX ( 1): <cr>
[00248073ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00248088ms][INFO][CELL]: AT TX ( 1): <cr>
[00249082ms][INFO][CELL]: Continue after 16 seconds
[00265103ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00265120ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00265137ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00265165ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00265193ms][INFO][CELL]: AT TX ( 2): AT
[00265197ms][INFO][CELL]: AT TX ( 1): <cr>
[00266191ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00266206ms][INFO][CELL]: AT TX ( 1): <cr>
[00267200ms][INFO][CELL]: Continue after 32 seconds
[00299221ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00299238ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00299255ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00299283ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00299311ms][INFO][CELL]: AT TX ( 2): AT
[00299315ms][INFO][CELL]: AT TX ( 1): <cr>
[00300309ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00300324ms][INFO][CELL]: AT TX ( 1): <cr>
[00301318ms][INFO][CELL]: Continue after 64 seconds
[00365339ms][WARN][CELL]: URC already added with prefix: +CGEV: NW DEACT
[00365356ms][WARN][CELL]: URC already added with prefix: +CGEV: ME DEACT
[00365373ms][WARN][CELL]: URC already added with prefix: +CGEV: NW PDN D
[00365401ms][WARN][CELL]: URC already added with prefix: +CGEV: ME PDN D
[00365429ms][INFO][CELL]: AT TX ( 2): AT
[00365433ms][INFO][CELL]: AT TX ( 1): <cr>
[00366427ms][INFO][CELL]: AT TX ( 9): AT+CFUN=4
[00366442ms][INFO][CELL]: AT TX ( 1): <cr>
[00367436ms][ERR ][CELL]: CellularStateMachine failure: Device ready
[00367453ms][ERR ][CELL]: CellularStateMachine target state Attaching network, current state Device ready
[00367494ms][INFO][CELL]: CellularContext connect
[00367509ms][INFO][CELL]: CellularContext connect
[02166756ms][WARN][CELL]: No cellular connection
[02166771ms][INFO][CELL]: CellularContext connect

# No more output!
0xc0170 commented 3 years ago

@ARMmbed/team-ublox please review

RobMeades commented 3 years ago

Hi there, sorry to see you're having problems. From the AT log you've pasted in above it seems that you are getting nothing at all back from the cellular module; no response whatsoever, either before or after Modem power ON (timeout 1000 ms). Logically speaking, either (a) the TXD or RXD lines to the module are not functioning or (b) assuming you are connected via UART, the RTS line to the module is not functioning (or is not tied low if unused) or (c) power is not being supplied to the module or (d) the PWR_ON pin of the module has not been toggled correctly to power it on (needs to be held low for 150ish ms and then released) or (e) the module is broken.

Assuming this was a working lump of HW/SW that is now a not working lump of HW/SW without anything obvious having been physically done to it (aside from possibly the power loss mentioned), I'd guess that your problem is unlikely to be (a), (b) or (d). You can check that (c) or (d) are not the problem by probing the VINT output pin of the module: if it is at 1V8 then the module both has power and is powered-up (i.e. the toggling of the PWR_ON pin has occurred).

If all of the above finds nothing, you could probe the TXD and RXD lines as near the module as you can with something like a Salaea probe to confirm that the TXD line does show the "AT\r" that you are sending and the RXD line [doesn't] show any "\r\rOK\r\n" response (maybe prefixed with "AT\r" if echo is still on at that point).

I suggest you poke around at these things and then update this issue with the outcome of your investigations, hoping that we don't end up at (e).

phlegx commented 3 years ago

On mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F437xG/TARGET_UBLOX_C030/ONBOARD_UBLOX_AT.cpp I have added some trace:

#include "CellularLog.h"

nsapi_error_t ONBOARD_UBLOX_AT::hard_power_on()
{
    tr_error("HARD POWER ON!");
    ::onboard_modem_init();
    return NSAPI_ERROR_OK;
}

nsapi_error_t ONBOARD_UBLOX_AT::soft_power_on()
{
    tr_error("SOFT POWER ON!");
    ::onboard_modem_power_up();
    return NSAPI_ERROR_OK;
}

But this trace errors are never shown in the serial console. I have set build profile as Debug and mbed_app.json with trace-level set to TRACE_LEVEL_DEBUG and:

"mbed-trace.enable": true,
"cellular.debug-at": true,

So, e.g. hard_power_on() are never called. Why? Do I need to set something more in mbed_app.json?

RobMeades commented 3 years ago

@0xc0170: hi Martin, I've not built mbed-os in nearly three years now so it would take me a little while to work out which way was up in order to help with the specific debug issue above; do you have any immediate suggestions?

phlegx commented 3 years ago

@RobMeades removing "UBLOX_AT.provide-default": true, from mbed_app.json helps, so that hard_power_on() etc. from ONBOARD_UBLOX_AT.cpp gets called. But the main problem persists.

"target_overrides": {
    "*": {
        "platform.stdio-convert-newlines": true,
        "platform.stdio-baud-rate": 115200,
        "platform.default-serial-baud-rate": 115200,
        "mbed-trace.enable": true,
        "cellular.debug-at": true,
        "cellular.clear-on-connect": true,
        "cellular.use-apn-lookup": true,
        "nsapi.default-cellular-apn": "\"some.provider.com\"",
        "nsapi.default-cellular-plmn": null,
        "nsapi.default-cellular-sim-pin": null,
        "nsapi.default-cellular-username": null,
        "nsapi.default-cellular-password": null,
        "lwip.ipv4-enabled": true,
        "lwip.ipv6-enabled": true,
        "lwip.tcp-enabled": true,
        "lwip.ethernet-enabled": false,
        "lwip.ppp-enabled": false,
        "ppp.ipv4-enabled": false,
        "ppp.ipv6-enabled": false
    },
    "UBLOX_C030_R412M": {
        "target.features_add": ["STORAGE"],
        "target.components_add": ["SD"],
        "target.network-default-interface-type": "CELLULAR",
        "storage.storage_type": "FILESYSTEM",
        "UBLOX_AT.tx": "MDMTXD",
        "UBLOX_AT.rx": "MDMRXD",
        "UBLOX_AT.cts": "MDMCTS",
        "UBLOX_AT.rts": "MDMRTS"
    }
}
RobMeades commented 3 years ago

Could you probe the VINT pin of the module after the point where you believe that PWR_ON has been toggled low to power the module on? If the VINT pin is at 1V8 then we know that (c) and (d) are not your problem. If it is at less than 1V8 then either the PWR_ON toggling has not worked for some reason (maybe the duration is too short or too long (should be > 150 ms and < 1500 ms as seen by the module (noting that >= 1500 ms is the toggle duration that switches the module off) I normally use 300 ms)) or somehow power is not being supplied to the module itself.

If VINT is at 1V8, can you confirm that you used to get responses from it over the AT interface, i.e. you are pretty sure that (a) and (b) are not the problem?

If the answer to the above is "yes" then we're at (e).

phlegx commented 3 years ago

@RobMeades VINT etc. works fine.

But is this code correct: ublox_low_level_api.c#L41

void ublox_board_init(void) {
    gpio_t gpio;

    // Enable power to 3V3
    gpio_init_inout(&gpio, PWR3V3, PIN_OUTPUT, OpenDrain, 1);

#if defined(TARGET_UBLOX_C030_R41XM)
    /* In case of SARA-R4, MDMRST needs to be asserted for 10 seconds before modem actually powers down.
     * This means that modem is initially responsive to AT commands but powers down
     * after 10 seconds unless MDMRST is de-asserted (onboard_modem_init()).
     *
     * This will cause confusion for application as CellularDevice::is_ready()
     * will return TRUE initially and later modem will power off without any indication to application.
     */
    gpio_init_out_ex(&gpio, MDMRST,    1); // <----------------- see here!
    gpio_init_inout(&gpio, MDMPWRON, PIN_OUTPUT, OpenDrain, 1);
#else
    gpio_init_out_ex(&gpio, MDMRST,    0); // <----------------- see here!
    gpio_init_out_ex(&gpio, MDMPWRON,  0);
#endif
    gpio_init_out_ex(&gpio, MDMRTS,    0); // <----------------- see here! In my opinion, this line should be removed.
    gpio_init_in_ex(&gpio,  MDMCURRENTSENSE, PullNone);
 ...

In comparison to: onboard_modem_api.c#L48

void onboard_modem_init()
{
    gpio_t gpio;

#if defined(TARGET_UBLOX_C030_R41XM)
    // Set the pin to high so on powerup we can set low
    gpio_init_inout(&gpio, MDMPWRON, PIN_OUTPUT, OpenDrain, 1);
#endif
    // Take us out of reset
    gpio_init_out_ex(&gpio, MDMRST,    1); // <----------------- see here! 
}
RobMeades commented 3 years ago

I think you may be confusing RTS and RST [somewhat unfortunate choice of acronym on our part!]?

If you have an MCU pin connected to the SARA-R412M module's reset (i.e. RST) pin then that pin must NOT be asserted, i.e. must be left to float high. If it is asserted (i.e. pulled low) for 10 seconds or more then the cellular module will reset. I'm not entirely sure what happens if it is pulled low for less than 10 seconds, hopefully nothing, but it is safer to leave it high or let it float high, if there is no intention to actually reset the module.

The RTS (i.e. request to send) pin, on the other hand, must be pulled low if the UART HW on your MCU is not using HW flow control, otherwise the cellular module will never send any AT responses or URCs back to you.

Does that make sense?

phlegx commented 3 years ago

@RobMeades they look so much alike. My mistake!

phlegx commented 3 years ago

What is the best way to catch cellular errors like the following, to be able to debug better?

[00148294ms][DBG ][CELL]: AT error -3012
[00148298ms][ERR ][CELL]: CellularStateMachine failure: Device ready
[00148325ms][DBG ][CELL]: callback: 4096, err: -3012, data: 4096
[00148368ms][INFO][CELL]: CellularContext connect
[00148394ms][ERR ][CELL]: CellularStateMachine target state Attaching network, current state Device ready

Thank you so much for the help and time @RobMeades!!!

RobMeades commented 3 years ago

I'm not sure exactly what this snippet of log really means, I'm afraid; the last time I used mbed-os it didn't have a state machine in the cellular driver.

Can you widen the debug you post to include the bit above this? On the face of it AT error -3012 means there's been an error of some form, while the state machine says it has reached the state Device ready, so it can't be a very serious error. Yet, from what you have indicated above, the cellular module is not even responding to AT commands.

Can you clarify?

phlegx commented 3 years ago

INVALID SOLUTION

I use NVIC_SystemReset() in my code for special situations. I found out, that after call NVIC_SystemReset() the device restarts and the Ublox run in this problem. After I add 10.5 seconds sleep after line 60 on onboard_modem_api.c#L60 the problem disappears.

What I do:

deinit()
{
    onboard_modem_power_down();
    onboard_modem_deinit();
    NVIC_SystemReset();
}

Adding 10.5 seconds to onboard_modem_api.c on function onboard_modem_deinit() solves the issue:

void onboard_modem_deinit()
{
#ifndef TARGET_UBLOX_C030_R41XM
    gpio_t gpio;

    // Back into reset
    gpio_init_out_ex(&gpio, MDMRST, 0);

    thread_sleep_for(10500);
#endif
}

Why? I don't know and I need to investigate more.

phlegx commented 3 years ago

What does mbed_app.json config UBLOX_AT.provide-default": true? If I re-add this config, the program can connect and send data. If I remove this config, the program hangs on:

00025745ms][INFO][CELL]: AT RX (32): <cr><ln><cr><ln>+USORD: 0,1024,"<0B><00><0F><C5><00><0F><C2><00><05>;0<82>
...
[00085692ms][WARN][CELL]: AT timeout
[00085696ms][DBG ][CELL]: AT error -3012
[00085711ms][DBG ][CELL]: AT error -3012
[00085725ms][WARN][CELL]: AT timeout
[00085740ms][DBG ][CELL]: AT error -3012
[00085755ms][DBG ][CELL]: consume_to_tag not found
[00085770ms][DBG ][CELL]: AT stop tag not found
[00085786ms][DBG ][CELL]: AT error -3012
phlegx commented 3 years ago

After more investigation I have found a bug!

Whit this mbed_app.json config, the ONBOARD_UBLOX target is used and the serial connection to the Ublox module can work properly.

"target_overrides": {
    "*": {
        "platform.stdio-convert-newlines": true,
        "platform.stdio-baud-rate": 115200,
        "platform.default-serial-baud-rate": 115200,
        "mbed-trace.enable": true,
        "cellular.debug-at": true,
        "cellular.use-apn-lookup": true,
        "nsapi.default-cellular-apn": "\"some.provider.com\"",
        "nsapi.default-cellular-plmn": null,
        "nsapi.default-cellular-sim-pin": null,
        "nsapi.default-cellular-username": null,
        "nsapi.default-cellular-password": null,
        "lwip.ipv4-enabled": true,
        "lwip.ipv6-enabled": true,
        "lwip.tcp-enabled": true,
        "lwip.ethernet-enabled": false,
        "lwip.ppp-enabled": false,
        "ppp.ipv4-enabled": false,
        "ppp.ipv6-enabled": false
    },
    "UBLOX_C030_R412M": {
        "target.features_add": ["STORAGE"],
        "target.components_add": ["SD"],
        "target.network-default-interface-type": "CELLULAR",
        "storage.storage_type": "FILESYSTEM",
        "sd.SPI_MOSI": "SPI_MOSI",
        "sd.SPI_MISO": "SPI_MISO",
        "sd.SPI_CLK": "SPI_CLK",
        "sd.SPI_CS": "SPI_NSS"
    }
}

Bugfix on ONBOARD_UBLOX.cpp#L30 add line serial.set_flow_control(SerialBase::RTSCTS, MDMRTS, MDMCTS);

CellularDevice *CellularDevice::get_target_default_instance()
{
#if defined(TARGET_UBLOX_C030_R41XM)
    static UARTSerial serial(MDMTXD, MDMRXD, MBED_CONF_PLATFORM_DEFAULT_SERIAL_BAUD_RATE);
    serial.set_flow_control(SerialBase::RTSCTS, MDMRTS, MDMCTS);
    static ONBOARD_UBLOX_AT device(&serial);
#elif defined(TARGET_UBLOX_C030_N211)
...
RobMeades commented 3 years ago

Interesting: how did it ever work before? The RTS and CTS lines are both definitely connected between the STM32F4 UART and the SARA-R412M module on the C030-R412M board, all of the C030 board flavours in fact.

phlegx commented 3 years ago

@RobMeades with UBLOX_AT.provide-default": true in mbed_app.json it takes another context default instance (see here: UBLOX_AT.cpp#L99) instead of the context default instance defined for the Ublox C030 (see here: ONBOARD_UBLOX.cpp#L30).

And you can see, that UBLOX_AT.cpp#L99 uses serial.set_flow_control(SerialBase::RTSCTS, MBED_CONF_UBLOX_AT_RTS, MBED_CONF_UBLOX_AT_CTS); and the Ublox C030 default instance not. So, it should be added!

Using this target definition for C030 in TARGET_UBLOX_C030 folder is the correct way, because with this definition it uses the correct `onboard_modem_api.c``etc.

RobMeades commented 3 years ago

@0xc0170 hi again Martin. What approach should we take to this? Do you maintain any form of mbed-os 5 maintenance branch that such a fix should be made to, or does this become some form of "known issue" so that others don't have to repeat the investigation that phlegx has done?

0xc0170 commented 3 years ago

5.15 branch is still being maintained. Please create a fix for mbed-os-5.15 branch for review

RobMeades commented 3 years ago

@phlegx: will you do this or would you like me to? I have no way to build/test mbed-os at the moment but if you'd prefer that I did it please give me the changed file that you have tested and I will do the "paperwork".

phlegx commented 3 years ago

I will make a pull request.