cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.52k stars 3.7k forks source link

cdc: Confusing kafka compression option #113495

Open miretskiy opened 8 months ago

miretskiy commented 8 months ago

kafka sink supports "Compression" option to kafka_sink_config, but rejects "compression" top level option. This is confusing.

At the very least, provide a nice hint if top level compression option specified for kafka; better yet, just make it work with either kafka_sink_config={Compresion ...} and with top level compression option

Jira issue: CRDB-33022

Epic CRDB-39570

vkstack commented 7 months ago

how can I contribute to it.

wenyihu6 commented 5 months ago

Hi @vkstack! Thanks for your interests in contributing.

Do you have the crdb repo set up already? If not, this wiki is a nice place to start. Specifically, 1. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/73204103/Building+from+source+on+macOS 2. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/181338446/Getting+and+building+CockroachDB+from+source 3. https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/2221703221/Developing+with+Bazel would be useful. Let me know any links above are not public. After getting the repo set up, you should be able to run ./dev build without any errors.


I believe the issue above is referring to the limited configuration support we have in https://github.com/cockroachdb/cockroach/blob/66f33cbcbed38df0fa95c5f17176712526f7f79a/pkg/ccl/changefeedccl/sink_kafka.go#L185-L203. Note that we support the Compression option but not CompressionLevel.

Sarama supports the CompressionLevel option here https://github.com/IBM/sarama/blob/25c9c1a880e385781e1a39b49f8e7239e3d5e729/config.go#L188-L194. You should be able to add this option to kafka_sink_config in https://github.com/cockroachdb/cockroach/blob/66f33cbcbed38df0fa95c5f17176712526f7f79a/pkg/ccl/changefeedccl/sink_kafka.go#L185-L203. After adding it, this will get populated to the sarama config we use in https://github.com/cockroachdb/cockroach/blob/66f33cbcbed38df0fa95c5f17176712526f7f79a/pkg/ccl/changefeedccl/sink_kafka.go#L1150-L1152 (this part of the work is already completed and shouldn't require any additional work from you).

Let me know if anything above is unclear or if you encounter any issues along the way!

blathers-crl[bot] commented 5 months ago

cc @cockroachdb/cdc

cchenax commented 5 months ago

@wenyihu6 Hello, I'm the one who contacted you yesterday in slack, thanks!

pvinoda commented 4 months ago

Hi @wenyihu6 . I want to contribute to this issue as well. Can you assign it to me? I am getting started...

P.S: Thanks for the instructions

cchenax commented 4 months ago

Hi @wenyihu6 . I want to contribute to this issue as well. Can you assign it to me? I am getting started...

P.S: Thanks for the instructions

@pvinoda Hello, I already took this issue, you can find another issue, thanks!

Samarth2898 commented 4 months ago

Is the issue resolved?

Dhruv-Sachdev1313 commented 1 month ago

Hey @miretskiy, I just happen to look into this issue, Can you please explain what Compression configs are you pointing towards, Are we talking about compression level as @wenyihu6 suggested or perhaps a different way or location where we can specify compression type that the current parser misses. Going through documentation I can only find this way kafka_sink_config='{"Compression": "GZIP" }'

rharding6373 commented 1 month ago

Top-level compression option refers to the compression option listed in the options table in the documentation. Today, kafka only supports expressing compression through the kafka sink config, unlike cloud storage sinks which support compression and can use the top-level option.

So today we support creating a changefeed like:

CREATE CHANGEFEED FOR TABLE tbl INTO 'kafka://...' WITH kafka_sink_config='{"Compression":"GZIP"}';

We would also like to support:

CREATE CHANGEFEED FOR TABLE tbl INTO 'kafka://...' WITH compression='gzip';

Leveraging sarama's compression levels would also be interesting, but is probably not what this ticket was originally created for given that it has been tagged as an easy good first issue.