tests - Githubissues

Testing brainstorming ideas

Things to test:

Basic functionality of the client from a producer and consumer perspective:

able to connect to kafka broker
able to request data
able to produce data
does it handle errors from kafka broker appropriately when trying to do the above?

More advanced stuff:

leader change occurring, or leader change fails (or ends up with the same leader), do we handle this scenario correctly?

Unit tests:

mocked up brokers, producers, consumers, that throw specific types of errors in specific scenarios
- This is basically the route most other integrations with Kafka have gone
ideal: generic mock broker that connects over TCP, that any client could communicate with.
- Should have more than just the simple cases:
- error on first connect/request
- error later down the line (e.g. 10th request, or 10th request after 5th write)
- error on specific topic or partition, etc.
Something similar to Yahoo Steaming Benchmark, with a Kafka Client Test framework that has a generic mock broker in it, which behaves a certain way, and then you have apps built in different clients and different languages, that try to reproduce the same behaviour (e.g. spec behaviour) with that mock broker.
- e.g. topic:test, partition:0 is set up to have a specific response, and all the clients will receive that response. Now test that all clients behave the same way when they receive that response.

Test Broker

Not connected to an actual kafka cluster (not necessarily, anyway)
clients connect to it as if they would to a real cluster
needs to implement the full kafka protocol
- how many versions of this protocol need to be supported?
some sort of flexible way to describe where and how errors should occur
- could be hard coded in the broker
client connects to an initial set of seed brokers
- update meta data request (data about the entire cluster: ip address, topics, partitions, leaders for each topic, etc.)
- connect to all the real brokers
Test Broker can bind to multiple addresses to act as all of the brokers
- This simplifies a lot of the work
- Helps control timing for everything
- Can choose how many addresses, topics, partitions, etc. are in the meta data response
Protocol is relatively simple.
Broker is allowed to disconnect connection if something goes wrong that it cannot handle.
- Client is required to handle reconnecting to a new (or existing) valid broker.

Proxy Broker

proxy a real kafka broker
intercept certain requests using a filter and apply some transformation to them
transparently forward the rest
Definitely have to intercept the metadata requests that include the real brokers' IP addresses (to prevent the clients from just connecting directly to the real brokers)
We don't really want to do this. It has a lot of issues around keeping state in terms of errors that we would show the client, but that the cluster is unaware of (because they never really occurred)

WallarooLabs / pony-kafka