eclipse-ecal / ecal

📦 eCAL - enhanced Communication Abstraction Layer. A high performance publish-subscribe, client-server cross-plattform middleware.
https://ecal.io
Apache License 2.0
823 stars 174 forks source link

Subscription & Monitoring breaks if Ethernet is unplugged, monitoring layer issue? #1668

Open chengguizi opened 1 month ago

chengguizi commented 1 month ago

Problem Description

We have been using eCAL on embedded system. A typical opertion procedure is that:

However, this operation is not working as expected. I have identify the key affecting factors.

  1. The break of stream to sub node happens after N seconds. Specified by
    [monitoring]
    timeout                   = 5000

    That is to say, if i increase the timeout to 50000, instead of sub node breaks after 5 sec, it breaks after 50 sec or so

  2. To investigate further, on monitoring layer. I have tried changing the following setting
    shm_monitoring_enabled      = true

    The issue seems to go away!

What is happening here? Is monitoring layer not falling back to lo multicast successfully, during runtime?

routing table on device. It has been setup that the multicast should fall back to lo if br0 (ethernet interface) is not present

default via 10.42.0.1 dev br0 proto dhcp src 10.42.0.64 metric 1024
...
239.0.0.0/24 dev br0 proto static scope link 
239.0.0.0/24 dev lo proto static scope link metric 1000 

How to reproduce

on embedded device:

Use ssh through ethernet connection:

screen -R test
$ ecal_sample_person_snd

Ctrl+A D (to detach)

in consult UART connection

ecal_mon_tui

## click / Enter on the topic to inspect the actual incoming stream of data

Observe that everything works correctly. Now unplug ethernet. After 5 second or so. the mon_tui become blank. Then, replug Ethernet, things comeback

Another strange thing. If we run ecal_sample_person_rec, the issue seems not there.

How did you get eCAL?

Custom Build / Built from source

Environment

Debian 12, arm64

eCAL System Information

$ ecal_config
------------------------- SYSTEM ---------------------------------
Version                  : v5.11.8 (2024-02-07 16:34:55 +0100)
Platform                 : linux

------------------------- CONFIGURATION --------------------------
Default INI              : /etc/ecal/ecal.ini

------------------------- NETWORK --------------------------------
Host name                : huimin-Vostro-5320
Network mode             : cloud
Network ttl              : 2
Network sndbuf           : 5 MByte
Network rcvbuf           : 5 MByte
Multicast group          : 239.0.0.1
Multicast mask           : 0.0.0.15
Multicast ports          : 14000 - 14010
Multicast join all IFs   : off
Bandwidth limit (udp)    : not limited

------------------------- TIME -----------------------------------
Synchronization realtime : "ecaltime-localtime"
Synchronization replay   : 
State                    :  synchronized 
Master / Slave           :  Master 
Status (Code)            : "everything is fine." (0)

------------------------- PUBLISHER LAYER DEFAULTS ---------------
Layer Mode INPROC        : auto
Layer Mode SHM (ZEROCPY) : auto
Layer Mode TCP           : off
Layer Mode UDP MC        : auto

------------------------- SUBSCRIPTION LAYER DEFAULTS ------------
Layer Mode INPROC        : on
Layer Mode SHM           : on
Layer Mode TCP           : on
Layer Mode UDP MC        : on
Npcap UDP Reciever       : off
chengguizi commented 1 month ago

It is also reproducible using ecal_mon_cli -l, when the commanded started with Ethernet connected.

KerstinKeller commented 1 month ago

Hi @chengguizi we'll look into this problem. In general, there is one socket for sending and one for receiving data for udp monitoring/registration traffic. I am unsure how that socket behaves, when the cable is unplugged, and how we handle this, in case of network enabled.

If you're working in local mode, there are no problems (at least on Windows), but as said, we need to investigate for Linux devices.

If you're enabling shm monitoring, you're sending monitoring info on both shm and udp. Shm is unaffected by network settings, and continues to function. However, with this mode you see only processes on the same host.

chengguizi commented 1 month ago

Hi @KerstinKeller , yes I agree there is no issues for Linux when in local mode. However, the use case was to use in cloud mode. And it is kind of a unwanted behaviour that such breaks happens when ethernet cable is unplugged.

chengguizi commented 3 weeks ago

@KerstinKeller Any updates on this? Should be able to reproduce on any embedded SBC, i.e. rasperberry pi with Ethernet