citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.43k stars 662 forks source link

Kafka integration and communication #44

Open ozgune opened 8 years ago

ozgune commented 8 years ago

Most Citus customers use a Kafka queue before they ingest data into the database. We need to investigate their use and have a better integration story between Kafka and Citus.

Kafka uses the Java runtime. This task may therefore relate to #4.

ozgune commented 7 years ago

We received questions about Kafka and the Bottled Water extension from three users recently; and I wanted to capture some of that context in this issue.

The primary motivation for Kafka integration could be one of the following:

  1. Put events into Kafka, make transformations on these events in Kafka, and then ingest them into Citus
  2. Use PostgreSQL as your primary data source, then read events from PG into Kafka (using Bottled Water), and then ingest them into Citus
  3. Use Citus as your primary data source, then read events from Citus into Kafka (using Bottled Water), and then ingest them into a different data store

We hear Kafka integration across these use cases and also have customers who set up their Kafka <> Citus pipelines. The second and third use cases relate to change data capture (CDC).

For the third item, we could integrate with Bottled Water or another extension such as Debezium.

fi0 commented 5 years ago

Bottled Water is unmaintained. Kafka Connect JDBC Connector cannot be used with Citus because of prepared statements. Does https://debezium.io/ work?

What're the other solutions for sinking kafka to citus?

jonels-msft commented 5 years ago

@fi0 not sure if this is the use case you have in mind, but to copy JSON messages from a Kafka topic into a Postgres table you can use the kafka-sink-pg-json tool. There's an example in our docs: http://docs.citusdata.com/en/stable/develop/integrations.html#ingesting-data-from-kafka

fi0 commented 5 years ago

thank you @jonels-msft https://github.com/justonedb/kafka-sink-pg-json doesn't seem to be well maintained though.

rajeshkt78 commented 1 year ago

The CDC feature for distributed tables (Preview) is availabe in Citus 11.3 release. Please find the release notes on CDC here: https://www.citusdata.com/updates/v11-3/#cdc_support