aio-libs / aiokafka

asyncio client for kafka
http://aiokafka.readthedocs.io/
Apache License 2.0
1.17k stars 234 forks source link

A large and controversial proposal: replace protocol implementation with kio #1027

Open antonagestam opened 5 months ago

antonagestam commented 5 months ago

Hi there, I'm hoping this finds you well 👋

This is not really a feature request, but rather a proposal to change internals of aiokafka, in a way that I believe will be beneficial both in terms of maintenance burden, as well as for making it easier to build new features.

tl;dr; let's replace aiokafka.protocol.* with kio.

Background

kio started out as a personal project, but is now an official open source project of Aiven, maintained by me and my team. We have the need to make custom protocol calls to Kafka to implement various features for orchestration and metering. We used to implement those very similar to how aiokafka and kafka-python currently does this, with hand-crafted models of every API message version that we were using. kio instead provides rich modeling of the full Kafka protocol, generated from the same source as used internally in Kafka (message definitions). It provides parsing and serializing capabilities of those, and it tests that random permutations of every version of every model serializes identically as Kafka does it upstream.

This test strategy enforces that every feature of the protocol is implemented, or the test suite of kio will not pass. Sometimes this is quite challenging, for instance KIP-893 introduced nullable struct fields, but this is not documented anywhere. Connecting the dots that ConsumerGroupHeartbeatResponse started using this undocumented feature was not obvious, but the kio test suite was failing until this was properly implemented. That is just a single example out of multiple similar nuances of the protocol that are not very easy to research in any other way than digging into the upstream Kafka source code.

The library has some nice bells and whistles, like extracting custom value types out of the schema, and using those throughout the modeling. There are also some mechanisms not supported by the protocol itself, but implemented to make the library more Pythonic, such as representing timestamps and intervals as datetime and timedelta, rather than plain primitives.

My idea that I am proposing here is based on the fact that kio contains all this hard work of mapping out all the features of the protocol and implementing them in canonical and well-typed Python code, and it seems to me like there is value for the Kafka-in-Python community to try to extract and make use of this building-block. This is also based on an assumption that I'm not the only one that found it hard to implement the Kafka protocol correctly, and that having an "outsourced" and correct-out-of-the-box implementation available would make it more approachable for a newcomer to contribute to projects like this.

My end vision is that future contributors will be able to pick new features from upstream Kafka that are not yet supported in aiokafka, and be be able to implement them without ever having to think about how its API entities are serialized over the network – this would be an already solved problem.

The actual proposal

So, the concrete proposal I am making here is to:

This will most likely need to happen in a gradual fashion, in order to facilitate reasonably sized change-sets. So during a transitional period, the project would have some protocol entities implemented as-is, and some imported from kio. Phasing out and replacing the existing hand-crafted models would be done over some time and distributed over many change-sets. If the idea is well received, let's work together on refining a plan for this.

I have no expectations of this being an easy decision to make, and I realize that this is potentially controversial, but I'm hoping to spark a discussion, and am looking forward to your thoughts and ideas on this.

Cheers!

dimastbk commented 5 months ago

Hi!

Looks interesting and more reliable than the current implementation. What about python 3.8+?

aiven-anton commented 5 months ago

@dimastbk 3.8 is EOL in October, so I maybe wouldn't bother, but you are right that for the missing versions in-between, support would need to be added.

antonagestam commented 5 months ago

Oops, commented with work account, @aiven-anton is also me.