elodina / dropwizard-kafka-http

Apache Kafka HTTP Endpoint for producing and consuming messages from topics
http://www.elodina.net
Apache License 2.0
154 stars 45 forks source link

Add info on running in non-Vagrant envs to README #6

Closed ches closed 10 years ago

ches commented 10 years ago

Hi,

When I first looked at the project, I wished that it told me how to run it in real environments. So I did that. Fixes some little Markdown issues too like numbered list syntax.

I also wish that the README told me about some things like the concurrency model. I'm still testing it myself and also getting acquainted with Kafka's Java APIs that are used, as well as Dropwizard/Jetty, in order to understand this, but if any maintainers can add that info it'd be great.

ches commented 10 years ago

Just talking out loud about (part of) what I meant by documenting concurrency model above, to share my understanding and to seek confirmation of it:

This project uses Kafka's high-level consumer API and the current implementation does not attempt to manage multiple threads for consuming from topics with more than one partition. Therefore, to achieve parallel consumption of such a topic, you will need to run multiple dropwizard-kafka-http processes (ideally one per partition), configured with the same groupId in kafka-http.yml.

If this is indeed accurate then I can add something to this effect to the pull request.

joestein commented 10 years ago

Hey Ches, thanks for the update. In regards to the consumer side we really never did anything with that... this was really all about making a HTTP producer and once we got to the consumer we didn't really have a use case so we didn't think about it. If you have a need for it and would use it happy to figure out what might be best and how that could work ... or maybe we remove the consumer all together, dunno.

ches commented 10 years ago

Thanks for your response Joe. I've seen the thread on #1 about integrating something like Atmosphere, or allowing it to be plugged in. I've also been trying mailgun/kafka-http which seems to have at least rudimentary support for some of this project's issues as a consumer: streaming via long-polling, and explicit offset committing via a POST endpoint. That project is unfortunately also lightly-documented, and is a more ad hoc Jetty servlet app with weak configurability—I like Dropwizard for the benefits of framework convention familiarity and its operator- and deployment-friendliness.

I'm basically assessing the viability of an HTTP consumer as a stopgap for some existing service components in languages that have no 0.8.x Kafka client libraries, or poor/incomplete ones. Both this project and the mailgun one have fairly prominent listing on the Kafka Clients wiki page, but unfortunately neither has very forthcoming documentation about project status or completeness. They're relatively small projects to grok, but especially if you're not too familiar with the language or libraries used and are just hoping for a utility service out of the box, the time required to test and find many details out for yourself is not insignificant.

I've considered implementing something myself with Akka/Spray both out of unmet needs and personal learning interest, but that'll take me some time to get to, and it'd be nice to instead/also try to improve these existing publicized projects. So I'll try, and I hope that bits of doc covering what I've learned so far is at least a start that's helpful to other people like me landing on the README :smile: