eclipse-ecal / ecal

📦 eCAL - enhanced Communication Abstraction Layer. A high performance publish-subscribe, client-server cross-plattform middleware.
https://ecal.io
Apache License 2.0
846 stars 176 forks source link

eCAL Cloud Configuration Not Sending/Receiving Data Correctly #650

Closed JeffreyZh4ng closed 2 years ago

JeffreyZh4ng commented 2 years ago

Currently I have two machines on the same LAN that both send out multicasts to one another via two different streams using eCAL in Cloud Mode. After configuring the network shown in the documentation (here), neither the server or device is able to receive multicast messages from one another.

Other details:

/etc/netplan/50-ecal-multicast.yaml on the device (server is the same just with a different static IP)

# Replace eth0 with your network adapter!
network:
  version: 2
  renderer: NetworkManager # GUI integration for desktop Ubuntu
  ethernets:

    # Replace eth0 with your interface!
    wlan0:

      # Either use DHCP...
      dhcp4: no
      dhcp6: no

      # ... or configure a static address!
      addresses:
        - 10.6.3.148/24

      routes:
        - to: 239.0.0.0/24
          via: 0.0.0.0
          metric: 1

ecal.ini (Same on both machines. Cloud mode on, hops to 10, only cloud mode enabled)

[network]
network_enabled           = true
multicast_group           = 239.0.0.1
multicast_mask            = 0.0.0.15
multicast_port            = 14000
multicast_ttl             = 10
multicast_sndbuf          = 5242880
multicast_rcvbuf          = 5242880

bandwidth_max_udp         = -1

inproc_rec_enabled        = false
shm_rec_enabled           = false
udp_mc_rec_enabled        = true

npcap_enabled             = false

[publisher]
use_inproc                = 0
use_shm                   = 0
use_udp_mc                = 1

Loopback file is the same as whats shown in the documentation.

If anyone has insight into how to get this working it would be greatly appreciated as I've been stuck on this issue for quite some time.

FlorianReimold commented 2 years ago

Your configuration looks fine. You should still check, whether the routes are actually created and point to your wifi device.

What I assume though, is that your Wifi Router is the issue. Usually corporate style Wifi access points block this kind of multicast traffic.

So you should

  1. Make sure that your routes are properly shown with route -n to make sure you don't have a configuration issue at some other place
  2. Maybe even check with Wireshark whether the UDP MC packets actually show up as outgoing packages
  3. Check the configuration of your Wifi router
JeffreyZh4ng commented 2 years ago

Thanks for the response @FlorianReimold. I can confirm that the routes for multicast and loopback are configured correctly and show up when i run route -n. Additionally, the multicast packets do show up on wireshark as seen in this screen shot (I'm running wireshark on the server and the client is sending out the multicast). The Wifi router is not a corporate style router and is just the router in my apartment.

Screenshot from 2022-05-05 01-31-59

Does ecal_sample_person_snd and ecal_sample_person_rec use UDP mutlicast? If so I'm really confused how this example works, but the code I'm running doesn't.

FlorianReimold commented 2 years ago

Yes, the samples use UDP Multicast (in the default configuration, saying that you are using a stock ecal.ini). So you get a working communication via WiFi when running ecal_sample_person_snd on one machine and ecal_sample_person_rec on the other?

Your screenshot shows outgoing UDP Multicast Traffic, but no incoming one. I cannot see whether there really isn't any or whether it's just not in the screenshot though.

JeffreyZh4ng commented 2 years ago

Yes, the samples use UDP Multicast (in the default configuration, saying that you are using a stock ecal.ini). So you get a working communication via WiFi when running ecal_sample_person_snd on one machine and ecal_sample_person_rec on the other?

That is correct. I also made sure to test out the reverse, so in one run machine 1 ran ecal_sample_person_snd and machine 2 ran ecal_sample_person_rec, and in another machine 1 ran ecal_sample_person_rec and machine 2 ran ecal_sample_person_snd. And again this was using WiFi.

FlorianReimold commented 2 years ago

OK, so when the eCAL Samples work you can assume that your network configuration is fine. The samples do exactly what your custom application would do in that case, as well (UDP MC for network). Maybe your application isn't publishing any data? Does it show up in eCAL Monitor on the other PC?

JeffreyZh4ng commented 2 years ago

The setup with my code works with Local Mode so I know the application is publishing data. When I use the eCAL monitor, I can see the correct subscriber/publisher being made on the other PC, just no data coming through.

FlorianReimold commented 2 years ago

OK, got an idea, can you tell me how large an individual message is that you are trying to publish? If it is large, the message has to be split in many smaller chunks and if one of those chunks gets lost, the entire message has to be dropped. This is even more relevant for WiFi connections, as those are prone to message drops.

We recently developed a (somewhat) reliable TCP layer that can be used as replacement in this case. The TCP Layer will be in eCAL 5.10 which will be released on monday. The documentation is already up: https://continental.github.io/ecal/advanced/layers/tcp.html

JeffreyZh4ng commented 2 years ago

Okay, that is much more likely to be the case of what is going on. I'm sending ~800Kb in a single message at a rate of 20 time per second which comes out to around 16 Mb/s. I can verify this theory in a bit by slowing it down and decreasing the packet size. If messages get received then we have our problem.

FlorianReimold commented 2 years ago

A 800KB message will result in about 550 Ethernet frames going through the WiFi connection. So yes, I am now very sure that this is the issue. If 1 frame is lost your whole image is gone forever. So let's see what we can do about that:

FlorianReimold commented 2 years ago

eCAL 5.10 has been released! Can you check it out and test if it solves your issue?

JeffreyZh4ng commented 2 years ago

Hey @FlorianReimold, I was able to update everything and after a bit of tweaking and system configuration, I was able to get everything to work. It seems like I was running into issues when I just set the ecal.ini file for TCP to be on, but after reverting that to the old config and telling the publisher to use TCP within the code, everything worked out. Thank you!

I also have a follow up question for the performance of the TCP layer withing eCAL. Is there available metrics for a performance comparison between the UDP and TCP layer? If both are transmitting the same packets on the same network, how much of a performance difference is there due to the extra overhead of TCP?

FlorianReimold commented 2 years ago

eCAL 5.10 uses a local (i.e. non-network) configuration by default, so I assume that is why you had to convert to your old config. As long as it works just keep it that way. But for future installations you can also set network_enabled = true in your ecal.ini.

Is there available metrics for a performance comparison between the UDP and TCP layer?

We currently don't have those kinds of metrics, at least not with eCAL. The downside of TCP is the higher latency (especially the connection establishment takes a lot of time!) and higher overhead due to header sizes and additional protocol messages like ACK messages. The eCAL TCP Layer is optimized for reducing the latency even with the risk of having even more overhead.

But in a 1->1 connection with large messages, I assume that the TCP Layer will probably always be faster than the UDP layer, as then the sequential throughput is more important than the latency. And TCP is really good for sequential throughput. We used out tcp_pubsub library (that is used for the TCP Layer in eCAL) to send more than 500 MB/s (Megabytes, not Mbit!) over network without any issue. I bet we could have gone quite a bit higher, but we didn't have more data available that we could send. And, well, your messages actually reach the receiver, which is a plus 😉

The biggest downside of the TCP layer probably is that each subscriber opens its own data stream. So even only using eCAL Mon over network on a TCP topic causes 3 transmits of each message, as we ship 3 monitor-plugins and each creates its own subscriber.

JeffreyZh4ng commented 2 years ago

Hi @FlorianReimold, I'm having similar issues with not being able to receive / send data with two machines again this time using the TCP layer. I have a new machine that is running Ubuntu 20.04 (I dont know if this is the root of any of the issues), but the two machines can definitely connect with eachother and ecal_sample_person_snd and ecal_sample_person_rec work perfectly fine between the two machines. Ive verified that the netplan yaml files are correct and that the ecal.ini is correctly configured. Additionally, this setup has worked on a different machine.

This time, when I run the ecal_mon_gui, there is a warning printout with the following.

2022-05-24 07:24:52.709 Info    VIO Server Reader   CTCPReaderLayer - TCPPubSub (Info) -Publisher 0.0.0.0:0: Created publisher and waiting for clients.
2022-05-24 07:24:59.151 Info    VIO Device Reader   CTCPReaderLayer - TCPPubSub (Info) -Publisher 0.0.0.0:0: Created publisher and waiting for clients.
2022-05-24 07:25:00.310 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:00.897 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.898 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.898 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.898 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.899 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:01.309 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.311 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.312 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.313 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.314 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:02.359 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.359 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.361 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.362 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.363 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.365 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:04.405 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:04.994 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:05.405 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.407 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.408 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.409 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.411 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:06.453 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.043 Warning VIO Device Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:07.453 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.455 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.456 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.457 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.458 Warning VIO Server Reader   CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later

Ive verified that the issue is coming from one particular machine (the Ubuntu 20.04 one). Do you have any insight into whats going wrong?