bpot / poseidon

A client for Kafka 0.8
MIT License
260 stars 101 forks source link

Consumer does not appear to fetch all messages from a topic? #90

Closed daluu closed 9 years ago

daluu commented 9 years ago

I'm new to Kafka, so let me know if this is a config/setup issue instead (and how I might workaround it), but I don't think it is. I'll try to test with test/dummy topics but so far using existing topics we have in our system. Not sure if I can replicate with a dummy topic.

Trying out this Ruby client, I notice that the client seems to only fetch a portion of the messages. We have messages that contain info like IP address and some unique ID, and we get a lot of messages through the system, too many to view on screen manually (big JSON strings), so I filter the output through grep on command line (or filter in the Ruby client code against the message collection object).

When I filter for certain IP address or ID that I know should exist in the messages, I get no output. Performing the same thing with the _kafka-console-consumer.sh_ script that's part of Kafka, it works fine, although executing that, we do specify and use zookeeper and not a simple consumer.

I just started trying out Kafka clients for other language bindings to compare. It looks like pykafka Python client works fine for me also with their simple consumer (not using zookeeper). I can filter by IP or ID and get output.

The messages all come from a single partition (zero) on one leader Kafka server, with two others that are not leaders. I connect to the leader. There is zookeeper running, and I'm only testing with one consumer client, although there might be other consumers running from other folks in my organization (or when I simultaneously debug with the Kafka consumer shell script, etc.). I also tried poseidon_cluster using their sample code (that uses zookeeper) but that didn't seem to help with this problem at all.

Has anyone encountered a similar issue? FYI, I'm just using consumer like in the sample code for consumer. And topics come from existing Kafka system, not publishing from this Ruby client to test with (yet).

I can try and attach message info or logs, whatever's needed, but I'm not sure what I can share, with respect to my organization's data. Don't have anything to post right now other than this.

daluu commented 9 years ago

My mistake, I used the sample consumer code which has earliest offset, and I should have used latest offset for testing current messages through the system. Still learning, you know.