bifravst / cat-tracker-fw

Cat Tracker Firmware
https://bifravst.github.io/
3 stars 3 forks source link

nrf9160 SICA chip - cloud_connect failed: -95 (EOPNOTSUPP) #53

Closed TjazVracko closed 4 years ago

TjazVracko commented 4 years ago

Hello, I have a working build evironment and have the whole stack (FW + AWS) working on nrf9160DK and Thingy91. We have manufactured out own prototype board with the nrf9160 SICA B0 1924 AH chip and I am now trying to get it to connect to AWS.

I connect our board to the nrf9160DK via the JLink cable and can build and flash the same way as for the nrf9160DK

First, I flashed the AT client and uploaded generated certificates:

AT+CFUN=4
OK
AT%CMNG=0,42,0,"-----BEGIN CERTIFICATE-----<cert>-----END CERTIFICATE-----"
OK
AT%CMNG=0,42,1,"-----BEGIN CERTIFICATE-----<cert1>-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----<cert2>-----END CERTIFICATE-----"
OK
AT%CMNG=0,42,2,"-----BEGIN RSA PRIVATE KEY-----<rsa_key>-----END RSA PRIVATE KEY-----"
OK

Te above commands are the same as LTE link monitor uses when uploading certificates over the GUI.

Listing the certificates gets me this output:

AT%CMNG=1,42
%CMNG: 42,0,"0000000000000000000000000000000000000000000000000000000000000000"
%CMNG: 42,1,"0101010101010101010101010101010101010101010101010101010101010101"
%CMNG: 42,2,"0202020202020202020202020202020202020202020202020202020202020202"
OK

Then I flash the cat-tracker-fw

 west build -p -b nrf9160_pca10090ns -d build10090ns
nrfjprog -f nrf91 --program ./build10090ns/zephyr/merged.hex --sectoranduicrerase -r

Output over UART is more or less as expected, but the device is unable to connect to the MQTT broker, giving error -95, which is defined as EOPNOTSUPP 95 /* Operation not supported on transport endpoint */. It does however connect to the network and is issued an IP.

See output bellow. I have bolded the relevant parts:

SPM: NS image at 0x18200
SPM: NS MSP at 0x2002fba0
SPM: NS reset vector at 0x1ee2d
SPM: prepare to jump to Non-Secure image.
[00:00:00.362,518] <err> ADXL362: Failed: 0

***** Booting Zephyr OS build v2.0.99-ncs1-rc1-761-g7e20180c004a *****
[00:00:00.388,916] <dbg> nrf9160_gps.init: MAGPIO set: AT%XMAGPIO=1,0,0,1,1,157
4,1577
[00:00:00.403,869] <dbg> nrf9160_gps.init: COEX0 set: AT%XCOEX0=1,1,1570,1580
The cat tracker has started
Version: 0.0.0-development
UPDATED YAYConnecting to LTE network. This may take several minutes.
[00:00:00.444,335] <inf> lte_lc: Using legacy LTE PCO mode...
rsrp value is 39
LTE connected!
Fetching modem time...
17/11/19,9:32:32
Device get binding device
GPS initialized
[00:00:09.379,089] <dbg> bifravst_cloud.broker_init: IPv4 Address found 18.185.
194.7
[00:00:09.576,477] <err> bifravst_cloud: mqtt_connect, error: -95
cloud_connect failed: -95
Enabling PSM
PSM enabled
[00:00:16.862,457] <dbg> nrf9160_gps.enable_gps: GPS mode is enabled
[00:00:16.869,812] <dbg> nrf9160_gps.enable_gps: Functional mode: 1
[00:00:16.876,617] <dbg> nrf9160_gps.start: GPS socket created
[00:00:16.892,883] <dbg> nrf9160_gps.start: GPS operational
GPS started successfully.
Searching for satellites
to get position fix. This may take several minutes

<GPS trying to get a fix here>

GPS operation was stopped
Checking LTE connection...
REGISTERED TO ROAMING NETWORK
Encoded message: {
        "state":        {
                "reported":     {
                        "bat":  {
                                "v":    3340,
                                "ts":   1576575223032
                        }
                }
        }
}
[00:01:16.891,174] <dbg> bifravst_cloud.data_publish: Publishing to topic: $aws/things/352656101007410/shadow/update
Cloud send failed, err: -128
...

I have looked through the code and can see that the error print happens in the cloud_poll(void) function in main.c, which is run in a separate thread. That function checks whether the cloud backend is null, but I was unable to determine which part of the struct is null and how this cloud_backend is even created/initialized.

Looking at the begining of main:

void main(void)
{
    int err;

    printk("The cat tracker has started\n");
    printk("Version: %s\n", DEVICE_APP_VERSION);
    printk("UPDATED YAY");

    cloud_backend = cloud_get_binding("BIFRAVST_CLOUD");
    __ASSERT(cloud_backend != NULL, "Bifravst Cloud backend not found");

    err = cloud_init(cloud_backend, cloud_event_handler);
    if (err) {
        printk("Cloud backend could not be initialized, error: %d\n ",
               err);
        cloud_error_handler(err);
    }
...

Neither of the cloud_get_binding or cloud_init return an error.

At this point I am a little lost. Can you give me some pointers in where I should look next to determine why this error happens. Is there a way for me to check if the program reads the certificates right? Or is there some other cause for this. Thank you for the continued help.

coderbyheart commented 4 years ago

Output over UART is more or less as expected, but the device is unable to connect to the MQTT broker, giving error -95, which is defined as EOPNOTSUPP 95 /* Operation not supported on transport endpoint */. It does however connect to the network and is issued an IP.

I am not sure that this is the right error name for the code, at least looking at the source code I don't see it used in conjunction with the cloud backend.

I think you mean ENOTSUP, which is used here?

It looks like the backend is not properly included in your custom firmware.

I have a working build evironment and have the whole stack (FW + AWS) working on nrf9160DK and Thingy91.

Are you building the firmware for those (which seems to be working) the same way? If yes, what is the diff then between those projects and the project for the custom PCB?

TjazVracko commented 4 years ago

Yes, I meant ENOTSUP, but following the definition it is defined as EOPNOTSUPP.

I am building the firmware like this, both for the dev-kit and for our custom board:

west build -p -b nrf9160_pca10090ns -d build10090ns
nrfjprog -f nrf91 --program ./build10090ns/zephyr/merged.hex --sectoranduicrerase -r

The code/project is exactly the same (unmodified cat-tracker-fw). I am doing the building and flashing from the same directory. Do I need to pass different build flags for the SICA chip?

coderbyheart commented 4 years ago

So, just to confirm, that firmware build works on the 91DK but not on your custom PCB. You should be able to use exactly the same FW for both.

Have you updated the modem firmware on your custom PCB's SIP?

coderbyheart commented 4 years ago

This documentation entry hints that it might be the certificates, although the way you provision the certificates look good.

Have you tried to use the LinkMonitor to write the certificates?

simensrostad commented 4 years ago

I managed to recreate this erroneous behavior by intentionally using the wrong certificates for my TLS connection. Please make sure that your certificates are flashed correctly, and that the correct certificates are used.

coderbyheart commented 4 years ago

Just for reference, the errorcode 95 is neither defined in NCS nor in Zephyr and we believe that it's a pass-through from the modem.

TjazVracko commented 4 years ago

Thanks. How can I make sure the certificates are flashed correctly? How can I access the certificate strings from FW code to be able to print them?

coderbyheart commented 4 years ago

How can I make sure the certificates are flashed correctly?

Use the nRF Connect for Desktop LinkMonitor to flash them, as described in the handbook.

How can I access the certificate strings from FW code to be able to print them?

This is not possible, because the certificate is store in a secure location which is not readable from the application (otherwise an attacker could read out the private key and impersonate a device).

TjazVracko commented 4 years ago

Greetings,

my colleague managed to flash the certificates correctly using LTE Link Monitor on his Windows machine. I still can't flash them correctly on Ubuntu - don't know why, but this might need investigating. We can close the issue.