eclipse-zenoh / zenoh-pico

Eclipse zenoh for pico devices
Other
114 stars 72 forks source link

[Bug] 0.11.0 example crashes Zephyr #546

Closed ypearson-bdai closed 2 months ago

ypearson-bdai commented 3 months ago

Describe the bug

Using the zenoh-pico examplez_pub.c crashes with the following

[00:00:02.012,000] <inf> net_config: IPv4 address: 192.168.11.2
[00:00:02.112,000] <inf> net_config: IPv6 address: 2001:db8::2
[00:00:02.112,000] <inf> net_config: IPv6 address: 2001:db8::2
uart:~$ Opening Zenoh Session...Unable to open session!
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.111,000] <err> pthread_mutex: Mutex is uninitialized (0)
[00:00:10.116,000] <err> os: ***** USAGE FAULT *****
[00:00:10.116,000] <err> os:   Illegal use of the EPSR
[00:00:10.116,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x2000798c
[00:00:10.116,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000030 r14/lr:  0x0800636d
[00:00:10.116,000] <err> os:  xpsr:  0x000f0000
[00:00:10.116,000] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:00:10.116,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:00:10.116,000] <err> os: Current thread: 0x20003208 (main)
[00:00:10.201,000] <err> os: Halting system
uart:~$ 

To reproduce

$HOME/.platformio/penv/bin/platformio init -b nucleo_f429zi --project-option framework=zephyr

$HOME/.platformio/penv/bin/platformio run --target clean $HOME/.platformio/penv/bin/platformio run --target upload picocom -b 115200 /dev/ttyACM0

System info

Ubuntu 22 LTS

jean-roland commented 3 months ago

Seems similar to #477

Thanks for notifying us, will look into it.

ypearson-bdai commented 3 months ago

Update: Move to recommended board nucleo-f767zi and default settings same result

One configuration that does not crash is the following Note the updated #define. However a Ubuntu z_sub does not consume the pub message from the nucleo. PC ip is 192.168.11.20, nucleo is 192.168.11.21 and connected via a netgear hub The pub msg can be observed via wireshark and nc -lu 7447

image

enxac1a3d0e545b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.11.20  netmask 255.255.255.0  broadcast 192.168.11.255
        inet6 fe80::ae1a:3dff:fe0e:545b  prefixlen 64  scopeid 0x20<link>
        ether ac:1a:3d:0e:54:5b  txqueuelen 1000  (Ethernet)
        RX packets 299696  bytes 27442051 (27.4 MB)
        RX errors 0  dropped 82544  overruns 0  frame 0
        TX packets 178622  bytes 18828733 (18.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <zenoh-pico.h>

#include "flash.h"
#define CLIENT_OR_PEER 1  // 0: Client mode; 1: Peer mode
#if CLIENT_OR_PEER == 0
#define MODE "client"
#define CONNECT ""  // If empty, it will scout
#elif CLIENT_OR_PEER == 1
#define MODE "peer"
#define CONNECT "udp/192.168.11.20:7447#iface=enxac1a3d0e545b"
#else
#error "Unknown Zenoh operation mode. Check CLIENT_OR_PEER value."
#endif

#define KEYEXPR "demo/example/zenoh-pico-pub"
#define VALUE "[STSTM32]{nucleo-F767ZI} Pub from Zenoh-Pico!"

#if Z_FEATURE_PUBLICATION == 1
int main(void) {
    sleep(5);

    // Initialize Zenoh Session and other parameters
    z_owned_config_t config = z_config_default();
    zp_config_insert(z_loan(config), Z_CONFIG_MODE_KEY, z_string_make(MODE));
    if (strcmp(CONNECT, "") != 0) {
        zp_config_insert(z_loan(config), Z_CONFIG_CONNECT_KEY, z_string_make(CONNECT));
    }

    // Open Zenoh session
    printf("Opening Zenoh Session...");
    z_owned_session_t s = z_open(z_move(config));
    if (!z_check(s)) {
        printf("Unable to open session!\n");
        exit(-1);
    }
    printf("OK\n");

    // Start the receive and the session lease loop for zenoh-pico
    zp_start_read_task(z_loan(s), NULL);
    zp_start_lease_task(z_loan(s), NULL);

    printf("Declaring publisher for '%s'...", KEYEXPR);
    z_owned_publisher_t pub = z_declare_publisher(z_loan(s), z_keyexpr(KEYEXPR), NULL);
    if (!z_check(pub)) {
        printf("Unable to declare publisher for key expression!\n");
        exit(-1);
    }
    printf("OK\n");

    char buf[256];
    for (int idx = 0; 1; ++idx) {
        sleep(1);
        sprintf(buf, "[%4d] %s", idx, VALUE);
        printf("Putting Data ('%s': '%s')...\n", KEYEXPR, buf);
        z_publisher_put(z_loan(pub), (const uint8_t *)buf, strlen(buf), NULL);
    }

    printf("Closing Zenoh Session...");
    z_undeclare_publisher(z_move(pub));

    // Stop the receive and the session lease loop for zenoh-pico
    zp_stop_read_task(z_loan(s));
    zp_stop_lease_task(z_loan(s));

    z_close(z_move(s));
    printf("OK!\n");

    return 0;
}
#else
int main(void) {
    asd
    printf("ERROR: Zenoh pico was compiled without Z_FEATURE_PUBLICATION but this example requires it.\n");
    return -2;
}
#endif
jean-roland commented 3 months ago

There's two things here. There is a hard fault when the session can't be opened because we're trying to free mutexes that weren't initialized. For some systems, pthread_mutex_destroy(NULL) or equivalent is fine, for some it isn't. I'll push a PR to fix to avoid that for Zephyr.

Then in your scenario that doesn't crash you are in Peer mode and it doesn't need to reach a router to open the session so it doesn't trigger the fault path. Seems the port on your PC is not opened for UDP so the messages get rejected.

Now why your session can't be opened in client mode, I don't know exactly what's your setup. Do you have a zenoh router with scouting activated running on your PC or somewhere in your local network?

ypearson-bdai commented 3 months ago

It does seem like the packets are getting thru nc -lu 7447 prints the stm32 string on my PC and on wire shark and never gets printed on z_sub. Interestingly I don't see the packets on Wireshark when running pub sub locally. Should I post the secondary issue some where else, sorry for adding it all here

jean-roland commented 3 months ago

For general support, the best place is the z_support channel on our Discord server (https://discord.gg/JN5R8BVg).

I advise you open a thread in there with a details of your setup and what you're trying to accomplish.

ypearson-bdai commented 2 months ago

@jean-roland I added the patch you merged in of the updated version of z_mutex_free but I get the same result

jean-roland commented 2 months ago

This is strange. Can you try with the dev/1.0 branch? We removed the session drop issue altogether on it.

ypearson-bdai commented 2 months ago

I'll try this, and report back

ypearson-bdai commented 2 months ago

I added this to get the requested branch lib_deps = https://github.com/eclipse-zenoh/zenoh-pico#dev/1.0.0 This branch seems to be broken

ypearson-bdai commented 2 months ago

Use 1.0.0.6 or greater