There are currently very few tests. Ideally there should be a more comprehensive test suite of automated tests along with a document of manual tests (with detailed instructions) that can be run.
Basic functionality of the client from a producer and consumer perspective:
able to connect to kafka broker
able to request data
able to produce data
does it handle errors from kafka broker appropriately when trying to do the above?
More advanced stuff:
leader change occurring, or leader change fails (or ends up with the same leader), do we handle this scenario correctly?
Unit tests:
mocked up brokers, producers, consumers, that throw specific types of errors in specific scenarios
This is basically the route most other integrations with Kafka have gone
ideal: generic mock broker that connects over TCP, that any client could communicate with.
Should have more than just the simple cases:
error on first connect/request
error later down the line (e.g. 10th request, or 10th request after 5th write)
error on specific topic or partition, etc.
Something similar to Yahoo Steaming Benchmark, with a Kafka Client Test framework that has a generic mock broker in it, which behaves a certain way, and then you have apps built in different clients and different languages, that try to reproduce the same behaviour (e.g. spec behaviour) with that mock broker.
e.g. topic:test, partition:0 is set up to have a specific response, and all the clients will receive that response. Now test that all clients behave the same way when they receive that response.
Test Broker
Not connected to an actual kafka cluster (not necessarily, anyway)
clients connect to it as if they would to a real cluster
needs to implement the full kafka protocol
how many versions of this protocol need to be supported?
some sort of flexible way to describe where and how errors should occur
could be hard coded in the broker
client connects to an initial set of seed brokers
update meta data request (data about the entire cluster: ip address, topics, partitions, leaders for each topic, etc.)
connect to all the real brokers
Test Broker can bind to multiple addresses to act as all of the brokers
This simplifies a lot of the work
Helps control timing for everything
Can choose how many addresses, topics, partitions, etc. are in the meta data response
Protocol is relatively simple.
Broker is allowed to disconnect connection if something goes wrong that it cannot handle.
Client is required to handle reconnecting to a new (or existing) valid broker.
Proxy Broker
proxy a real kafka broker
intercept certain requests using a filter and apply some transformation to them
transparently forward the rest
Definitely have to intercept the metadata requests that include the real brokers' IP addresses (to prevent the clients from just connecting directly to the real brokers)
We don't really want to do this. It has a lot of issues around keeping state in terms of errors that we would show the client, but that the cluster is unaware of (because they never really occurred)
There are currently very few tests. Ideally there should be a more comprehensive test suite of automated tests along with a document of manual tests (with detailed instructions) that can be run.