elodina / dropwizard-kafka-http

Apache Kafka HTTP Endpoint for producing and consuming messages from topics
http://www.elodina.net
Apache License 2.0
154 stars 45 forks source link

How do you create reliable consumers? #1

Open clarkbreyman-yammer opened 10 years ago

clarkbreyman-yammer commented 10 years ago

Joe - looks really interesting but the HTTP protocol seems to leave you open to dropping messages if the consumer fails after receiving the message but before processing it. We were looking to build something similar but use an outbound post to ensure that the consumer was able to complete processing of the message before ACK.

joestein commented 10 years ago

Hi Clark, we have started to think about that, yup. There are also low level consumer api scenarios that have to be implemented. For the consumer push scenario we have been looking at a few options including a way to plug in your own. So you could use https://github.com/Atmosphere/atmosphere or http://pusher.com/ or whatever. Right now more of our http based use cases are on the producer side but if you had an implementation we could do that.

So from a reliability perspective we need a low level consumer so that the offsets are synced after processing by the caller and done so by the caller. We also have to add a REST interface for committing the offsets with zookeeper for that caller's consumer group's consumer. This will allow the consumer to use zookeeper for managing the offsets but controlling all the business logic around that through a REST interface. This should be /consumer/ implementation with the GET being latest offset's message response for that topic, group, partition and POST being to commit the offset (which the caller will be responsible to-do)

If we did the above would that address your needs instead of a push? Or if you need/want a push mechanism can you elaborate more on specifics?

clarkbreyman-yammer commented 10 years ago

Having the REST client call back to ACK is going to double the number of round-trip requests increasing latency and likely throughput. I'm still in the process of wrapping my head around kafka now but it seems like the parallelism of a consumer group is bounded by the number of partitions... meaning that the throughput is limited by the number of partitions and the consumption latency.

Having the REST client talk with ZK seems to defeat the purpose of protocol encapsulation.

joestein commented 10 years ago

Parallelism of the consumers within a group are bound partitions, yes.

Here is more info on the low level (simple) consumer https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

We can hook in atmosphere and do web socket and fall back to long polling, would that work better?

We can support multiple different type of consumers.