Closed andras-sereny closed 9 years ago
Thanks Davide. I like Kafka very much and it might be the right tool if we want to send event logs from one component to another. I wonder, though, in our use case, what does it bring? Kafka’s strength is in the fact that data can be consumed multiple times, even at another time; messages are durable. I think we just want to send data immediately, no need to store.
As for scaling, and the big number of events, what’s the scenario you have in mind? A computing environment running multiple consumers to process data faster? We could easily use multiple ZMQ connections. Generally speaking, ZMQ claims to scale. I guess all of this is a pure theoretical concern, since the orchestrator can push data through a single channel much faster than the answer rate of any reasonably sophisticated recommender system.
How about we implement sending train/test data to the computing environment via ZMQ? Leaving the Kafka solution in place, but as an option for CEs. I expect the ZMQ implementation and usage to be much simpler, for development purposes especially.
Yes, I think it’s a good idea to offer computing environments a choice between different communication methods. REST APIs are of course very popular thus it seems a good choice.
As discussed in the workshop, we'd like to use Kafka/Flume in a production setting. Supporting a development setup is continued in #49.
This issue is a continuation of an email thread about orchestrator - CE communication:
Hi Andras, I agree with you, all your comment are correct, I would like to explain to you why we choose to use kafka. The main idea is to use component that are "production ready" and that could be easily be scaled to manage "big" number of events. Of course this introduce a lot of complexity in the development of the computing env (and also in the orchestrator stuff) but also give us the possibility to use very common objects (like kafka and zookeeper).
I also agree with you that zmq is already present and could be used instead of kafka, one idea is to have a restful API that receive events and data (this cover the newsreel integration too) so to have the developer to have the possibility of choosing web api or kafka without changing the rest of the orchestrator.
What do you think? can we continue the talk on a new github issue so to track our conclusion?
thanks, Davide
On Tue, Mar 10, 2015 at 9:43 AM, András Serény sereny.andras@gravityrd.com wrote: