confluentinc / bottledwater-pg

Change data capture from PostgreSQL into Kafka
http://blog.confluent.io/2015/04/23/bottled-water-real-time-integration-of-postgresql-and-kafka/
Apache License 2.0
3 stars 149 forks source link

Allow suppression of (or explicitly enable) messages for updates and deletes #111

Open kinghuang opened 8 years ago

kinghuang commented 8 years ago

For tables with a primary key, Bottled Water transmits the table row as the message for inserts and updates, and a message value of null for deletes. In tables that don't have a primary key (or replica identity index), Bottled Water has an --allow-unkeyed option where inserts and updates are sent to Kafka as messages without a key, and deletes are not sent to Kafka.

I'm working in tables that contain log rows from an application (same database as #110). The tables have primary keys. However, I'm only interested in inserts and updates, not deletes. It would be useful to have an option for Bottled Water to not send deletes for tables with a primary key, similar to the behaviour when working with a table without a primary key.

Summary

  1. Bottled Water is installed on a database whose tables contain log/event rows. The tables have primary keys defined.
  2. As rows are inserted, Bottled Water transmits them as messages to Kafka.
  3. Various consumers listen to the topics and process the messages for downstream use.
  4. Once in a while, old rows are deleted (i.e., "rolling over" old logs/events).
  5. The deletes should not be transmitted to Kafka. The Kafka topics for these tables should continuously accumulate new/updated rows from Bottled Water.
samstokes commented 8 years ago

This seems related to the proposal in #54 to have messages include a "type" field that would explicitly distinguish inserts, updates and deletes: i.e. instead of {"value": "hello"}, you'd get {"type": "insert", "value": "hello"}. Then the Kafka consumer could filter to get only inserts.