cpursley / walex

Postgres change events (CDC) in Elixir
MIT License
282 stars 14 forks source link

Great Idea! #15

Closed yashh closed 9 months ago

yashh commented 11 months ago

Thanks for building this. Love this idea. Would love to relay all changes into S3 using Kinesis or something

cpursley commented 11 months ago

Thanks, most of the credit goes to Supabase, cainophile and Postgrex.

You could do that right in Elixir. But having out of the box connectors would be pretty cool. I have an open issue for adding a webhook config but haven't had time to work on it: Event forwarding

yashh commented 11 months ago

Yeah. I wrote a genserver which collected all log information and every minute it would flush to Kinesis as a batch write. So there is a chance we could lose data if this process goes down. Is there a way to remember the log position and store it somewhere so that we can sweep the missed logs if the process goes down

cpursley commented 9 months ago

Sorry @yashh - I completely missed your last message.

Is the genserver process to flush Kinesis something you are using in conjunction with WalEx, or something you were doing before / trying to replace?

yashh commented 9 months ago

I was using that genserver for capturing some important user interactions. But was checking if this project has some intention of relaying the CDC to some sort of storage.

cpursley commented 9 months ago

@yashh makes sense.

Would storing the CDC data in this format be sufficient?

{
  "type": "update",
  "record": {
    "id": 1234,
    "name": "Chase Pursley"
  },
  "old_record": {
    "id": 1234,
    "name": "Chase"
  },
  "changes": {
    "name": {
      "added": "Chase Pursley",
      "changed": "primitive_change",
      "removed": "Chase"
    }
  },
  "commit_timestamp": "2021-12-06T14:32:49Z"
}

I'm kicking some ideas around.

yashh commented 9 months ago

Yeah that looks good. I am assuming a workflow where this json would be shipped via Kafka to S3 in parquet. We can use Duckdb to query a list of changes for a given record id.

cpursley commented 9 months ago

Would you prefer something configuration based where you don't have to modify any Elixir (and simply just need to run the app)? Meaning, configure the database & tables you want to listen to and then configure where you want to send the changes (i.e., Kafka, webhook, etc)?

yashh commented 9 months ago

Yeah that makes sense. A config where we can list database, tables to stream. Kafka / webhook sounds good. Configuring it could be little hard. So I prefer the dev to implement that part themselves as long as they get a stream of CDC data from walex

cpursley commented 9 months ago

@yashh just shipped the ability to forward events to webhooks or EventRelay.

EventRelay is a sister project (also in Elixir) that allows you to skip the operational complexity of a Kafka. Supports a few destinations including s3. We're planning to support parquet as well.