Closed JeffreyZh4ng closed 2 years ago
Your configuration looks fine. You should still check, whether the routes are actually created and point to your wifi device.
What I assume though, is that your Wifi Router is the issue. Usually corporate style Wifi access points block this kind of multicast traffic.
So you should
route -n
to make sure you don't have a configuration issue at some other placeThanks for the response @FlorianReimold. I can confirm that the routes for multicast and loopback are configured correctly and show up when i run route -n
. Additionally, the multicast packets do show up on wireshark as seen in this screen shot (I'm running wireshark on the server and the client is sending out the multicast). The Wifi router is not a corporate style router and is just the router in my apartment.
Does ecal_sample_person_snd
and ecal_sample_person_rec
use UDP mutlicast? If so I'm really confused how this example works, but the code I'm running doesn't.
Yes, the samples use UDP Multicast (in the default configuration, saying that you are using a stock ecal.ini
). So you get a working communication via WiFi when running ecal_sample_person_snd
on one machine and ecal_sample_person_rec
on the other?
Your screenshot shows outgoing UDP Multicast Traffic, but no incoming one. I cannot see whether there really isn't any or whether it's just not in the screenshot though.
Yes, the samples use UDP Multicast (in the default configuration, saying that you are using a stock
ecal.ini
). So you get a working communication via WiFi when runningecal_sample_person_snd
on one machine andecal_sample_person_rec
on the other?
That is correct. I also made sure to test out the reverse, so in one run machine 1 ran ecal_sample_person_snd
and machine 2 ran ecal_sample_person_rec
, and in another machine 1 ran ecal_sample_person_rec
and machine 2 ran ecal_sample_person_snd
. And again this was using WiFi.
OK, so when the eCAL Samples work you can assume that your network configuration is fine. The samples do exactly what your custom application would do in that case, as well (UDP MC for network). Maybe your application isn't publishing any data? Does it show up in eCAL Monitor on the other PC?
The setup with my code works with Local Mode so I know the application is publishing data. When I use the eCAL monitor, I can see the correct subscriber/publisher being made on the other PC, just no data coming through.
OK, got an idea, can you tell me how large an individual message is that you are trying to publish? If it is large, the message has to be split in many smaller chunks and if one of those chunks gets lost, the entire message has to be dropped. This is even more relevant for WiFi connections, as those are prone to message drops.
We recently developed a (somewhat) reliable TCP layer that can be used as replacement in this case. The TCP Layer will be in eCAL 5.10 which will be released on monday. The documentation is already up: https://continental.github.io/ecal/advanced/layers/tcp.html
Okay, that is much more likely to be the case of what is going on. I'm sending ~800Kb in a single message at a rate of 20 time per second which comes out to around 16 Mb/s. I can verify this theory in a bit by slowing it down and decreasing the packet size. If messages get received then we have our problem.
A 800KB message will result in about 550 Ethernet frames going through the WiFi connection. So yes, I am now very sure that this is the issue. If 1 frame is lost your whole image is gone forever. So let's see what we can do about that:
Use eCAL 5.10 with TCP (you can already grab the latest 5.10 build from our CI or just wait for monday 😉) Honestly, I think this is the way to go. We developed the TCP layer for that exact use case (transferring images over network)
Try to make UDP work (it's possible!)
ecal.ini
contains a parameter bandwidth_max_udp
that will cause eCAL to not send message out as fast as possible but honor the given setting.
https://continental.github.io/ecal/configuration/options.html#cmdoption-arg-bandwidth_max_udpeCAL 5.10 has been released! Can you check it out and test if it solves your issue?
Hey @FlorianReimold, I was able to update everything and after a bit of tweaking and system configuration, I was able to get everything to work. It seems like I was running into issues when I just set the ecal.ini file for TCP to be on, but after reverting that to the old config and telling the publisher to use TCP within the code, everything worked out. Thank you!
I also have a follow up question for the performance of the TCP layer withing eCAL. Is there available metrics for a performance comparison between the UDP and TCP layer? If both are transmitting the same packets on the same network, how much of a performance difference is there due to the extra overhead of TCP?
eCAL 5.10 uses a local (i.e. non-network) configuration by default, so I assume that is why you had to convert to your old config. As long as it works just keep it that way. But for future installations you can also set network_enabled = true
in your ecal.ini
.
Is there available metrics for a performance comparison between the UDP and TCP layer?
We currently don't have those kinds of metrics, at least not with eCAL. The downside of TCP is the higher latency (especially the connection establishment takes a lot of time!) and higher overhead due to header sizes and additional protocol messages like ACK messages. The eCAL TCP Layer is optimized for reducing the latency even with the risk of having even more overhead.
But in a 1->1 connection with large messages, I assume that the TCP Layer will probably always be faster than the UDP layer, as then the sequential throughput is more important than the latency. And TCP is really good for sequential throughput. We used out tcp_pubsub library (that is used for the TCP Layer in eCAL) to send more than 500 MB/s (Megabytes, not Mbit!) over network without any issue. I bet we could have gone quite a bit higher, but we didn't have more data available that we could send. And, well, your messages actually reach the receiver, which is a plus 😉
The biggest downside of the TCP layer probably is that each subscriber opens its own data stream. So even only using eCAL Mon over network on a TCP topic causes 3 transmits of each message, as we ship 3 monitor-plugins and each creates its own subscriber.
Hi @FlorianReimold, I'm having similar issues with not being able to receive / send data with two machines again this time using the TCP layer. I have a new machine that is running Ubuntu 20.04 (I dont know if this is the root of any of the issues), but the two machines can definitely connect with eachother and ecal_sample_person_snd
and ecal_sample_person_rec
work perfectly fine between the two machines. Ive verified that the netplan yaml files are correct and that the ecal.ini is correctly configured. Additionally, this setup has worked on a different machine.
This time, when I run the ecal_mon_gui, there is a warning printout with the following.
2022-05-24 07:24:52.709 Info VIO Server Reader CTCPReaderLayer - TCPPubSub (Info) -Publisher 0.0.0.0:0: Created publisher and waiting for clients.
2022-05-24 07:24:59.151 Info VIO Device Reader CTCPReaderLayer - TCPPubSub (Info) -Publisher 0.0.0.0:0: Created publisher and waiting for clients.
2022-05-24 07:25:00.310 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:00.897 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.898 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.898 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.898 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:00.899 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:01.309 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.311 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.312 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.313 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:01.314 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:02.359 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.359 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.361 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.362 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.363 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:03.365 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:04.405 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:04.994 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:05.405 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.407 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.408 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.409 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:05.411 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:06.453 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.043 Warning VIO Device Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->arapaho:43167: Failed to resolve address: Host not found (authoritative)
2022-05-24 07:25:07.453 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.455 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.456 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.457 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
2022-05-24 07:25:07.458 Warning VIO Server Reader CTCPReaderLayer - TCPPubSub (Warning) -SubscriberSession ?->jeffrey-System-Product-Name:44067: Failed to resolve address: Host not found (non-authoritative), try again later
Ive verified that the issue is coming from one particular machine (the Ubuntu 20.04 one). Do you have any insight into whats going wrong?
Currently I have two machines on the same LAN that both send out multicasts to one another via two different streams using eCAL in Cloud Mode. After configuring the network shown in the documentation (here), neither the server or device is able to receive multicast messages from one another.
Other details:
/etc/netplan/50-ecal-multicast.yaml on the device (server is the same just with a different static IP)
ecal.ini (Same on both machines. Cloud mode on, hops to 10, only cloud mode enabled)
Loopback file is the same as whats shown in the documentation.
If anyone has insight into how to get this working it would be greatly appreciated as I've been stuck on this issue for quite some time.