Lightweight Message Bus Alternative for Embedded Environments

ODIM-Project / ODIM

Apache License 2.0

38 stars 59 forks source link

Lightweight Message Bus Alternative for Embedded Environments #39

Open ehsjoar opened 3 years ago

ehsjoar commented 3 years ago

The current Kafka based message bus is a good approach for addressing use cases that address larger installations that also requires the solution to scale. For smaller embedded solutions it generates a too large memory footprint though. For embedded solutions we need something more lightweight. This issue is there to be a container for that type of work. From an agile manner the story would be something like "As an integrator or solution builder I need ODIM to have a lightweight message bus so that ODIM can live in embedded environments"

As a principle we would keep the current APIs that are there to interact with the message bus and by that avoiding changes in any other parts of the code. We would need to introduce some configuration parameter that dictates what message bus technology to use though.

AnandSGit commented 3 years ago

There are 2 options to implement this requirement and we should consider and measure How much lightweight message queue or PubSub system required for any new module. We can use NATS Server-based communicator or we can do raw TCP communicators. TCP based would be very lightweight as well.

In the case of NATS, as we already explored the option, it is very easy to incorporate as we have experience in it. But for raw socket-based communicators, we need to implement fresh.

Will explore both and get back before next TSC. In either case, we don't need to change the signature of the interfaces which we use to communicate with the message bus. Thanks.

williamcaban commented 3 years ago

@ehsjoar It might be useful to define the characteristics of the use cases we want to target with this. The term "embedded" spans across a wide variety of configurations, deployment layouts, and footprints that might not all have the need for this functionality, or there might be scenarios where it will be more challenging to achieve these capabilities.

For example, If we frame this enhancement on use cases like the universal CPE (uCPE) or distributed Telco vRAN (e.g. DUs), or servers with satellite uplinks, there is a clear quick win and advantage for those use cases. On the other hand, it might be more challenging or less relevant to have this integration if we consider a small embedded device in a vehicle. What do you think?

@AnandSGit another way to consider "small footprint" for "raw TCP" will be using a something similar to the Qpid Dispatch Router (QDR) (https://qpid.apache.org/components/dispatch-router/index.html) (that is a single binary) and using a technique similar to what the Skupper project (https://skupper.io) is doing in which they use QDR as an in-memory translation of TCP communications on highly distributed and mixed environments. The QDR can also be used as in-memory only local bus while, if needed, forwarding to an external bus (e.g. Kafka). Such setup allows the real-time access of local processes to those queues in the bus for any interaction in the local system, while retaining the possibility to forward to an external bus with persistent storage capabilities (if needed).

AnandSGit commented 3 years ago

Hi @williamcaban, After reading just intro into QDR, have some queries. Is it almost similar to RabbitMQ? Both handle wire-level AMQP. So just thought of checking. If both are similar, I am already having the implementation done using both NATS and RabbitMQ which uses the almost similar implementation which is been used in ODIM. Thanks

williamcaban commented 3 years ago

@AnandSGit This is where having the use cases defined will help. From the use case perspective, if we consider a uCPE or 5G DU, every millicore (e.g. with Containers) or core consumed, is a penalty for the platform. These are the use cases I'm considering. With other use cases, this might not be the case.

To answer your question about QDR vs RabbitMQ: one is a router (very low consumption of resources), the other is a broker (more resources required and may have storage requirements). A router (https://qpid.apache.org/releases/qpid-dispatch-master/user-guide/index.html#what-routers-are-qdr) is an intermediary for messages but it is not a broker. It does not take responsibility for messages. It just forwards them to clients or other brokers.

Since these can be considered implementation details for particular use cases, help me understand that first: What use cases or constraints are you considering? (e.g. is there an embedded ODIM instance or is there a regional or centralized instance? how stable is the network? how many resources can be consumed in the local compute?)

AnandSGit commented 3 years ago

@williamcaban, @ehsjoar - This is just based on my understanding on the system. Today by looking at the whole ODIM system, seems like it was never designed to address any Embedded requirements/use cases at least in the message bus case. There was always a requirement of having KAFKA as a message broker as many of the ODIM users / Management stations would like to see the communication data ported into Prometheus / Grafana.

Now with this new requirement/use case, I am thinking of using something like NATS server (this can be configured as Broker as well instead of the just router). This is kind of very lightweight compared to heavy KAFKA. NATS doesn't impact the size of the target binary too much and NATS single binary router itself is a mere 10MB in size. Another version of NATS which does the broker work is available as well. But NATS-Streaming Server is having issues when we try to stream and store the messages in the disk. I have noticed its performance becoming poor in that case. And there is a sidecar development done for this streaming requirement called "Liftbridge". I didn't explore this subsystem yet.

With NATS we can consider some of the features like load balancing and routing control inherited by default with multiple instances running in the cluster. So if you have a target system, where you can deploy this router and check, I have a ready implementation, which we can use and test it. For now, I just re-implemented with the name "nats.odim". (One Go File and 1 TOML file for configuration store).

And as you mentioned, we can use this module only in the leaf nodes/systems (Last mile). And for some specific kind of messages (Those needs to be persisted), we can write a small custom plugin / can use the "LiftBridge" subsystem to stream these special messages to the external system / Management station in a periodical manner.

Any mistakes in my understanding of the use case or on the system, please revert. Thanks

EDIT: Forgot to mention. NATS is having the Request-Reply Messaging model as well. So using NATS, we can do transfer quickly between 2 systems just like a function call. One major Drawback with NATS - Its Fire and Forget model. The message is not stored anywhere. If the target subscriber was offline, it will never receive the message. We can look and see if we can overcome this issue by writing some kind Hook / Plugin for these high availability cases.

aiyagari commented 3 years ago

@AnandSGit QDR is not at all the same as RabbitMQ - in fact it is a lot closer to NATS and it would therefore be good for you to evaluate both to figure out which meets your needs. However where it adds some useful capabilities are in multipathing, flow control, reverse-connect (dealing with wrong-way firewalls - common problem in industrial use cases), app-layer security and friendliness to WAN-based transports. These are all issues that are commonly seen when deploying embedded systems. May I suggest the following presentation given at the Eclipse IoT and Edge Native Day where quite a bit of detail is given: https://www.crowdcast.io/e/May28_2020_IotEdgeNativeDay/9

AnandSGit commented 3 years ago

@aiyagari Thank you for suggestion. Apache is having Qpid Proton implementation done using Golang, which is the target I am looking at. Will get back to the forum on progress. Goal is to match the interfaces already implemented already and if needed, may be I need to add some more interfaces and make it obsolete for other router / broker cases like KAFKA. Currently I am trying to reuse this utility system : https://github.com/AnandSGit/hybridpipe.io.

Went thru the link and observe the discussion around QDR. Thanks for the link :-). It is very good that we don't need any side car implementation on the user end. We are having the similar thing with NATS as well. Go implementation is done as wrapper around the C library. So very well tested component as well. Thanks again.

Bharath-KKB commented 3 years ago

When we consider these small footprint environments I guess we are talking about edge sites. Communication with central site will be https based so that need not affect our decision on the MQ.

Factors that will affect our decision:

The number of BMC/Servers are going to be tiny maybe half a dozen or so. So handling a large amount of messages is not a requirement.
messaging is only needed for plugins to forward events over to ODIMRA
ODIM and plugins will be itself running on a small VM or sharing a host with some other applications.
Topology is static and rerouting will not be needed.
support for message protocol not needed
secure communication needed

The important requirements for the MQ will be

a small broker footprint as ODIM will have very less compute/storage resource available
reliable with good error handling
good support for remote management(like trouble shooting, truncating/deleting files and related operation)
support for TLS
availability of go lang wrappers(native go lang impl maybe too good to ask for)

Support for small sized messaged looks interesting at the onset but not useful. This is because to make use of it we will have to reduce the message size used by ODIM itself. Since the messaging is used for event delivery there is no reduction possible.

Given these I think we will have to compare the below( or couple more like these)

reliable MQTT server with a permissible license or
Qpid
DDS ?? message brokers like RabbitMQ, NATS etc. might still be overkill and having big footprint. RabbitMQ will also need erlang and libraries to be installed additionally. MQTT implementations are known to run well in resource constrained environments with interfaces for remote comms if needed. We will not be running on microcontrollers but less resource used anywhere will be good especially given we dont need the extra features.

AnandSGit commented 3 years ago

Using the following lists of packages, I have started working on my private repo (HybridPipe) for PoC.

ZeroMQ - https://github.com/go-zeromq/zmq4.git Qpid - https://github.com/Azure/go-amqp.git or Apache QPid DR (Implemented in C with Golang wrapper) MQTT - https://github.com/eclipse/paho.mqtt.golang.git

The selection of Broker/Router will be configurable as demonstrated in HybridPipe examples. Once this PoC is tested with local repo implementation, the same would be incorporated into ODIM on approval. Please revert if any have suggestions in moving forward. Thanks

AnandSGit commented 3 years ago

I checked the idle memory load while running NATS. (NATS-streaming server is the broker). It is not even using 2 MB (Please find the attached image). And compared to the features which NATS / QPid giving with MQTT, personally I feel these 2 are worth incorporating and not much of an overkill for sure compared the benefits we get if we choose.

And my understanding, in the case of MQTT, we need a broker in between and work only in the PUBSUB model. Based on the discussion we had today, we have a requirement on peer-peer communication as well, with no broker in between with some of the benefits mentioned by @aiyagari. And seems like these are the requirements not for the ODIM components, but for the plugins and Northbound components that are outside ODIM core components those may want to communicate between themselves and expect ODIM to provide a framework for the same. Please correct me if my understanding is off.

So for now, I am looking into AMQP 1.0 implementations, MQTT, and, ZeroMQ in that order. Looking at the ZMQ as well because it is lightweight, fast, and secure model as well. Again because we make the messaging utility lib, I thought, we should provide wider options for the user and if it is not tough to include.

AnandSGit commented 3 years ago

Comparison of the Messaging platforms and ODIM requirements: KAFKA: Currently ODIM components are using it for messaging with-in ODIM space. The same is been forced for any external components to communicate with ODIM from Plugin space or Management Station space. (Only Communication platform in-use in ODIM). The purpose was to facilitate Performance Data collection and visioning the use of Prometheus / Grafana integration into message data analysis space. NATS: Very lightweight and fast. It follows the "At least Once" delivery model. Implementation is already available and can be easily incorporated into ODIM if required. If NATS messages need to be streamed for data analysis, either we need to use "https://github.com/liftbridge-io/liftbridge" as a streaming server or we need to write our own lightweight plugin to intercept NATS and stream to DB / storage for future use. NATS' own streaming server is a bit of slow and still work is under to improve the performance to match other Brokers. QPID Dispatch Router: Using underlying AMQP 1.0 router and its library for communication. Please refer to the messages from @aiyagari and @williamcaban for some advantages of using Qdrouter. MQTT: Started the implementation already. Will share once done with PoC. Other Platforms which supported, for now, is RabbitMQ. (AMQP 0.10)

Usecases: In the future, if there is a requirement where some of the ODIM parts are required run in smaller edge level devices where the resource usage is restricted. Maybe we should be able to handle existing KAFKA based systems on these devices too. Need to test and verify. But as I mentioned in prev messages, we can provide multiple communication options to ODIM users with these smaller changes in the system.

Progress: The producer part of AMQP 1.0 is working and able to send the data as part of PoC. Because of the platform restrictions I have, I am running "qdrouterd" inside a container and trying to connect from the tester applications from the personal repo. PoC Repository Link: https://github.com/AnandSGit/hybridpipe.io.git Once done with PoC, I will make a feature branch and do the changes directly into ODIM after approval.

Already NATS part is working and the user should be able to take this code and integrate or test it.

NOTE: If anyone started working on these use cases, please let me know and proceed so that we don't duplicate the tasks. Thanks.