apache / openwhisk-package-kafka

Apache OpenWhisk package for communicating with Kafka or Message Hub
https://openwhisk.apache.org/
Apache License 2.0
33 stars 43 forks source link

Propose a lightweight model for the message consumer service #19

Closed houshengbo closed 7 years ago

houshengbo commented 7 years ago

OpenWhisk manages different components in different docker containers, each of whom is dedicated to host one service taking care of the workload for all the users. The current in-progress package messaging(kafka) service, designed in the same model, launches one docker container to manage all the consumers and triggers created by all the users, etc. The major pitfall of this design is that it restrains all the resources in one container and directly leads to single point of failure and lack of scalability.

Digging into the code a little bit, we can see a internal hashmap or dict is created to save all the consumers and mapping relation with the triggers, and each new consumer/new trigger has to create a new record in the database for consistency and recovery if there is a demand for the service restart.

The following issues are what we can predict with the current design:

If the triggers exceeds this number, what can we do to this situation without changing the implementation? Some are possible, but no ideal solution at this moment.

We can see that the internal map of the registry is really an issue, which limits the number triggers we are able to handle, and make the service rather difficult to scale.

In order to resolve the above issues, we have to reconsider the design model for the message consumer service. I would like to propose a more lightweight model, in which, the consumer service only takes care of listening to the messaging service, picking up the message from the specific topic and consuming it. We are able to launch more consumer services in HA mode, since they have the same responsibilities.

Where do we get the information of the trigger to be fired? From openwhisk's perspective, we are already able to create the actions and triggers, and associate a rule between them, so the only thing that the consumer service is to get the information for the trigger. Instead of maintaining the relationship between the consumer and the trigger in the internal map/dict and the database. We can offload the information of the trigger to provider service. When the provider discovers a change from the event source, it sends a message with the trigger information, e.g. the trigger ID, trigger uri, etc, and credentials and the payload of the final action into the messaging service. When the consumer service receives it, simply fire the trigger by the uri with the payload.

In this model, the consumer service has been tailored into a lightweight module, which listens to the topic, receives the message and fires the trigger with the payload and credentials, since all the information is prepared by the provider service and available in the message. There is no database involved and no specific information to be saved. The service is able to scale, since we can launch the consumer service as many as we need. As long as there is one consumer service running, our service can guarantee the availability.

The only thing we need to define is the message format to be consumed by the consumer service. I will describe the message format in details for the design of the provider service: https://github.com/openwhisk/openwhisk-package-kafka/issues/20.

mrutkows commented 7 years ago

Do we really want to propagate the Trigger ID and responsibility outside the domain of OpenWhisk? After all, the major goal of an event-driven paradigm is "fire and forget" not "fire and keep track of triggers". This seems like offloading the (retention, storage, recovery) problem, plus it exposes a necessity to expose the Trigger as a top-level interaction whereas we really only want the ecosystem to focus on event generation and Actions.

mrutkows commented 7 years ago

To me, a trigger is essentially (from a functional perspective) a name that maps to a URI (effectively a DNS for Serverless); its job is to maintain that relationship so others do not have to. Saying we can "We can offload the information of the trigger to provider service" to me says we expect provider services to provide pub/sub/registration which is the ideal future, but not true in the majority case today. I am looking forward to seeing your design in issue #20...

jberstler commented 7 years ago

I agree @mrutkows. Requiring the data providers to do OpenWhisk bookkeeping for us will preclude a lot of integration scenarios where the OpenWhisk user simply wants to fire triggers based on data that is already being produced from a variety of sources. This is really the bread and butter scenario for Kafka where any number of disparate entities produce messages without knowing or caring who will consume them.

houshengbo commented 7 years ago

I have updated the design of the provider service https://github.com/openwhisk/openwhisk-package-kafka/issues/20. The provider service will provide different kind of templates, which listen to the changes from different event sources. Each user can choose which service is his event source and launch an instance of the provider with the specific trigger information. One instance of the provider service serves only for one user. We do not need to register all the trigger mappings on the provider side as well.

I am sort of concerned with current consumer and trigger registry within the memory, which is also a common model followed by the openwhisk service providers, like the providers of cloudant service and alarm service. We end up with issues of hitting the cap of the memory by creating too many triggers or consumers, losing the availability of the service if one single service runs into outage, scaling vertically only with no alternative to add other horizontal compute resources, like containers or VMs, breaking down the in-memory map or the database record then breaking everything, etc. Anyone of the above issues can cause trouble as service deployed in cloud, not to mention all of them existing at the same time.