eclipse-zenoh / zenoh

zenoh unifies data in motion, data in-use, data at rest and computations. It carefully blends traditional pub/sub with geo-distributed storages, queries and computations, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks.
https://zenoh.io
Other
1.49k stars 160 forks source link

Question: scaling (vs. dds), simulation, ... #116

Open ptrdtznr opened 3 years ago

ptrdtznr commented 3 years ago

I was reading the last days a about DDS and Zenoh. Zenoh is boost your application if you have (resource) constraint devices. And few questions popped into my head, maybe you could help me to clarify them.

  1. Is DDS really as bad as stated on this slide 22 from Angelo Corsaro? Is there anywhere a proof? Otherwise it is a claim and there are a lot of claims on the internet ;-)
  2. Since DDS is using quite lot of multicasts (which arent available over the internet), is this the reason why zenoh does mostly one-to-one communication?
  3. Have you thought of deploying zenoh on ns-3 or planetlab? Then you could "easily" create some plots, ...?
  4. Couldn't you use a epidemic broadcast tree for the data propagation?
OlivierHecart commented 3 years ago
  1. With the standard DDSI protocol, all Participants in the system need to discover all remote Participants, DataReaders and DataWriters in the system. A reliable "channel" with some state to maintain on both sides needs to be established between all matching reliable DataWriter/DataReader pairs in the system. So clearly this does not scale very well and you'll hit some limit both regarding local resources usage as well as discovery traffic with a large system. UDP multicast is obviously also an issue for wide systems. This is something that already caused some difficulties to DDS users in the past. More recently we got several feedbacks from the ROS2 (which chose DDS as it's default protocol) community about issues related to the DDSI discovery traffic. That's why we get more and more interest about zenoh from this community for inter robot communications.
  2. From our past experience UDP multicast is often poorly supported especially by radio based networks (even WiFi) and by mobile devices. It is not suitable to reliably transport data payload. That’s why, zenoh leverages UDP multicast for dynamic discovery only in combination with a gossip discovery. Data goes through unicast communications (TCP, UDP unicast, etc …)
  3. Deploying zenoh on ns-3 or planetlab is not in our short term plan. But we'll clearly need to better characterize its scalability and we'll consider those solutions for sure.
  4. So far our investigations and experiments have led us to use a combination of a link-state protocol, local computations of shorter paths trees (bellman-ford) and ad-hoc interest propagation for data routing. But we'r constantly trying to improve zenoh routing and are highly interested in suggestions or ideas that could help us on this. We hope to be able to write soon a blog post on zenoh.io explaining how and why we got to current zenoh routing. This should hep us collaborate and identify what could be beneficial to zenoh.
OlivierHecart commented 3 years ago

By the way, you may be interested in reading:

  1. This zenoh.io blog post about discovery overhead with some real figures.
  2. This zenoh.io blog post on zenoh reliability and scalability.
kydos commented 3 years ago

Hello @ptrdtznr,

@OlivierHecart already provided you with some links that show the numbers. Beside that, it may be good for you to know that most of us have been deeply involved with DDS and know its protocol inside out. Thus if you have specific questions on how DDS and zenoh protocol differ do not hesitate to ask -- ideally on zenoh's gitter.

For what concerns NS-3 or PlanetLab, we have servers-farm in house and have done scalability tests with hundreds of routers on our premises. For what concerns internet scale deployment we have an infrastructure running 24/7 that we use for demo and user trials. That said, PlantLab could be interesting to investigate some pathologic cases.

Finally, the reason why we don't use epidemic algorithms is that they look nice on paper, but in reality -- this is real-world experience -- they create way too much traffic and additionally give probabilistic guarantees that are hard to work with in several use cases.

A+

ptrdtznr commented 3 years ago

hello @kydos - thanks for the reply and the additional information. I was ready the last couple of days quite a lot regarding the DDS spec. It is getting clearer.

From my point of view, the epidemic algorithms have also some advantages. As usual, its always about the pro/cons. But will get more and more complex and we can discuss seperate ;) really appreciated your answers here guys!