Open alexzeit opened 4 weeks ago
Thanks for the report @alexzeit, will try to reproduce on my F767ZI.
Just to clarify you have a Python publisher sending a message every 2ms to a pico subscriber And a pico publisher sending a message every 1ms to a Python subscriber All in peer mode. Do you have two boards or the publisher and subscriber are on the same nucleo?
Actually, it would probably be easier if you could send me the project files you used.
Hi Jean-Roland yes, but the same behaviour I have observed with c++ pubsub and 1ms in peer mode. We have one boards where publisher and subscriber are running in separate threads of zephyr rtos.
Alright, so it seems the error message is produced by Zephyr when it ran out of RX buffers to store messages.
My guess is it breaks the connection and since we do not yet have connectivity event support (see Issue #333) the only possibility is to restart the node.
Alternatively, you can try increasing the number of RX buffers, that should reduce the occurrence rate, see https://docs.zephyrproject.org/2.7.5/reference/kconfig/CONFIG_NET_BUF_RX_COUNT.html and https://docs.zephyrproject.org/2.7.5/reference/kconfig/CONFIG_NET_PKT_RX_COUNT.html
That also means pico has a hard time keeping up with this message rate, and as we discussed before we're going to look into performance after the 1.0 release.
Yes, it seems to be by zephyr, but this is caused by zenoh core. I think the issue is that zenoh starts the Ethernet receiver but it takes time until it starts to consume the bytes from eth Rx buffer. Because in other case, where the python publisher is not running during zenoh start up, this issue is not happening. I have tried to increase the rx buffer, but this did not solve the problem
So I tried reproducing the issue on my board with a pub/sub with 1ms frequency without success (or failure?). Is it possible for you to send me the files you used for the board and PC?
Describe the bug
In case of restart of zenoh-pico mcu sporadic got <"err> eth_stm32_hal: Failed to obtain RX buffer" and subscriber stopped receiving the messages. This happens only if publisher e.p python is already running or have been started before the mcu with zenoh-pico have been started. An additional delay (e.g. 10s) before z_declare_subscriber does not help. Used peer connection.
To reproduce
System info