golioth / golioth-zephyr-sdk

Golioth SDK For Zephyr
https://www.golioth.io
Apache License 2.0
65 stars 20 forks source link

Problem with APN setup on NRF9160 using CONFIG_PDN #83

Closed ramb0t closed 2 years ago

ramb0t commented 2 years ago

I'm having an issue configuring a custom APN when using Golioth, however the same method works fine when using the vanilla Nordic examples

Background; My NBIoT capable sim card requires a custom APN to be set for a data connection to be opened. If you do not set a custom APN the sim registers on and receives the default operators APN, which seems to work but blocks any data at all from being sent. I believe the suggested way to add a custom APN with Zephyr for the latest NRF toolchain is with the following added in your prj.conf file:

#APN setup 
CONFIG_PDN=y
CONFIG_PDN_SYS_INIT=y
CONFIG_PDN_DEFAULTS_OVERRIDE=y
CONFIG_PDN_DEFAULT_APN="flickswitch"

# I also set NBIoT mode 
CONFIG_LTE_NETWORK_MODE_NBIOT=y

If I add the above to the Nordic samples such as at_client or asset_tracker, which compile and connect just fine without issues.

However when adding the same to a Golioth sample prj.conf, such as hello or dfu samples, the APN is not set correctly. One of the early log messages shows: [00:00:12.480,377] <err> pdn: Failed to configure CID 0, err, -8

Full output of my modified hello sample as follows:

uart:~$ *** Booting Zephyr OS build v2.6.99-ncs1-1  ***
[00:00:00.215,728] <inf> golioth_system: Initializing
uart:~$ +CEREG: 2,"5209","099D410D",9
+CSCON: 1
+CEREG: 1,"5209","099D410D",9,,,"11100000","00000110"
[00:00:13.809,906] <err> pdn: Failed to configure CID 0, err, -8
[00:00:13.818,756] <err> pdn: Failed to configure default CID, err -8
[00:00:13.828,277] <dbg> golioth_hello.main: Start Hello sample
[00:00:13.llo.main: Hi Discord :D
[00:00:13.844,848] <wrn> at_notif: Already initialized. Nothing to do
uart:~$ The Golioth Hello AT host sample started
[00:00:13.858,093] <dbg> golioth_hello.main: Calling Golioth Client Start
[00:00:13.867,492] <dbg> golioth_heo.main: Starting while loop
[00:00:13.876,098] <inf> golioth_hello: Sending hello! 0
uart:~$ 
        [00:00:13.893,157] <dbg> golioth_hello.main: Sleeping 5s
[00:00:13.901[00:00:18.901,123] <inf> golioth_hello: Sending hello! 1
[00:00:18.909,576] <wrn> golioth_hello: Failed
[00:00:18.918,182] <dbg> golioth_hello.main: Sleeping 5s
[00:00:23.926,147] <inf> golioth_hello: Sending hello! 2
[00:00:23.934,600] <wrn> golioth_hello: Failed[00:00:23.943,206] <dbg> golioth_hello.main: Sleeping 5s
[00:00:28.951,171] <inf> golioth_hello: Sending hello! 3
[00:00:28.959,625] <wrn> golioth_hello: Faileeeping 5s
[00:00:33.976,196] <inf> golioth_hello: Sending hello! 4
[00:00:33.984,649] <wrn> golioth_hello: Failed to send hello!
[00:00:33.993,255] <dbg> golioth_hello.main: Sleeping 5s
[00:00:36.953,826] <inf> golioth_system: Client connected!
[00:00:39.001,220] <inf> golioth_hello: Sending hello!

I modified the sample to allow the use of the LTE Link Monitor Utility, this shows how the device is registering with the default operator network APN (lte.vodacom) image

I then modified the sample to manually set the APN using the AT+CGDCONT command (first turning the modem off with AT+CFUN=0, then back on after) using the following code:

        if(at_cmd_write("AT+CFUN=0", NULL, 0, NULL) != 0)
        {
            LOG_INF("modem issue!\n");
        }else{
            LOG_INF("Modem off \n");
        }

        k_sleep(K_SECONDS(1));

        err = at_cmd_write("AT+CGDCONT=0,\"IP\",\"flickswitch\"", NULL, 0, NULL);
        if (err) {
                printk("Could not define PDP context +CGDCONT, error: %d", err);
                return err;
        }else{
                LOG_INF("APN Set\n");
        }

        k_sleep(K_SECONDS(1));

        if(at_cmd_write("AT+CFUN=1", NULL, 0, NULL) != 0)
        {
           LOG_INF("modem issue!\n");
        }else{
            LOG_INF("Modem on \n");
        }

This has the desired effect of setting the APN correctly and my device is assigned the correct IP address allocated to this SIM card image

However, Golioth was still unable to connect! No logs are sent to the cloud:

ramb0t@ubuntu:~/zephyr-nrf/modules/lib/golioth$ goliothctl logs --interval=10m
no logs found

Note also the somewhat sketchy UART output, not sure why that corruption is happening.

This may well be a problem with my toolchain setup, I'm not experienced enough to know. Once again however if I compile the Nordic asset tracker example with just the PDN config settings in prj.conf, then the device communicates with the Nordic cloud as it should.

My (sanitised) .config output from build/zephyr/.config attached: config.txt

0Grit commented 2 years ago

First thing that comes to mind is that I believe the Nordic examples offload networking to the modem side?

https://github.com/nrfconnect/sdk-nrf/blob/main/samples/nrf9160/at_client/prj.conf

0Grit commented 2 years ago

We do not offload TLS in our samples. https://github.com/golioth/zephyr-sdk/blob/main/samples/hello/boards/nrf9160dk_nrf9160_ns.conf

ramb0t commented 2 years ago

Ok I wasn't aware of that, what does it mean from a user point of view? Is my only option to manage the modem using AT commands from user-space?

0Grit commented 2 years ago

@mniestroj will know the Zephyr specifics of this, and have better context for NRF91. I will dig into this as well and determine how we can tweak our client architecture to improve the overall cellular experience. I know for a fact that this will not be the last modem management issue our users face.

mniestroj commented 2 years ago

@ramb0t could you try adding CONFIG_PDN_INIT_PRIORITY=89 to project configuration? I've tested following options:

CONFIG_PDN=y
CONFIG_PDN_SYS_INIT=y
CONFIG_PDN_DEFAULTS_OVERRIDE=y
CONFIG_PDN_DEFAULT_APN="iot.1nce.net"
CONFIG_PDN_INIT_PRIORITY=89

with samples/hello/ + nrf9160dk + 1nce sim card and it seems to work.

ramb0t commented 2 years ago

@mniestroj
Ok it worked! uart:~$ *** Booting Zephyr OS build v2.6.99-ncs1-1 *** [00:00:00.221,740] <inf> golioth_system: Initializing [00:00:06.662,078] <dbg> golioth_hello.main: Start Hello sample [00:00:06.662,109] <inf> golioth_hello: Sending hello! 0 [00:00:06.662,628] <wrn> golioth_hello: Failed to send hello! [00:00:06.662,689] <inf> golioth_system: Starting connect [00:00:07.331,451] <inf> golioth_system: Client connected! [00:00:11.662,689] <inf> golioth_hello: Sending hello! 1 [00:00:16.664,031] <inf> golioth_hello: Sending hello! 2 [00:00:21.665,374] <inf> golioth_hello: Sending hello! 3 [00:00:26.666,687] <inf> golioth_hello: Sending hello! 4 [00:00:31.668,060] <inf> golioth_hello: Sending hello! 5

But I had to do a fresh toolchain install. When I tried at first I was still having the corrupted serial output that I showed, then I tried to pull the latest repo and I had build errors. I decided to delete zephyr-nrf/ and reinstall the toolchain, this time it worked perfectly and gave the output above.

I am very sorry about this because now we do not know if my initial problem was due to this configuration or due to a faulty toolchain? (I am still very confused how it was possible to have to corrupted serial output, perhaps it was another config flag I added to try get the AT commands working)

My one suggestion would be to provide an example or sample to retrieve basic network information from the modem like APN, IP address etc (AT+CGDCONT) at a user level, so it is not too difficult if you are not familiar with all zephyr configs

beriberikix commented 2 years ago

That's a good sample suggestion! Maybe to do it on boot and send to Logs?

0Grit commented 2 years ago

Note that I created #121 to discuss the proposed sample.

0Grit commented 2 years ago

@mniestroj can you think of any code changes to the SDK that need, should, or could, be made to resolve this issue?

mniestroj commented 2 years ago

This issue is an application configuration issue. There are many more options to be configured besides APN and it is not possible to document/verify all such combinations. Instead if there are problems with reconfiguring application, it should be user responsibility to debug why new configuration is not valid and what other options need to be set/changed. Additionally, we cannot take ownership of NCS configuration challanges as part of this project, so closing this issue.