adevinta / zoe

The Kafka CLI for humans
https://adevinta.github.io/zoe
MIT License
287 stars 21 forks source link

[Docs improvement] Piping data between clusters (including topic keys, Avro topics) #52

Open whatsupbros opened 2 years ago

whatsupbros commented 2 years ago

Thanks again for this great CLI tool, made for humans!

I really love the feature of piping data between topics in different clusters with such a command:

zoe -c remote topics consume input --continuously | zoe -c local topics produce -t output --from-stdin --streaming

And it works perfectly fine for this simple case.

But would it be possible to elaborate more on some more complex, but still common use cases in the examples/documentation?

Piping Avro data between clusters

In this use case it is important to understand how exactly to migrate topic schemas together with data, when the schemas do not yet exist in the target cluster. Ideally, this is to be done as transparent for the user as possible (zoe should be able to publish record schemas from the source cluster to target cluster Schema Registry automatically and without issues. But even with the manual approach, it is not quite clear how to accompish this task, when there are multiple schemas for the source topic, and when there are records in the topic for more than one topic schema. Currently, it doesn't seem to be possible, due to the fact that we can to request only the latest topic schema with zoe (see issue #50). Also, piping of the schemas doesn't work for now (see issue #49). So, the only option left is to use curl or postman to publish all topic schemas upfront in the target cluster, and then start with piping of the data. However, I am not sure this is the proper way to do that all.

Piping data with keys between clusters

The trivial case, described in the examples, doesn't consider topic keys. As I understood, the only way to print keys with zoe is to use --expose-metadata option for the consumer. However, when you do so, you change the topic schema implicitely. I tried to use such a command to "fix" the issue and to extract the key:

zoe --cluster remote topics consume my-topic --continiously --expose-metadata \
| zoe --cluster local topics produce --from-stdin --topic my-topic --subject my-topic-value --key-path ".__metadata__.key" --value-path "del(.__metadata__)" --streaming

But unfortunately it didn't work for me (see issue #51). So, a recommendation on piping data between topics including with zoe would really beneficial.

Thank you!