Open gabfssilva opened 4 years ago
A Debezium connector sounds like a great idea. I took a look at the docs you mentioned. In the "Advanced Record Consuming" section it describes creating your own implementation to handle source batches of records. If calling the RecordCommitter
controls when the next batch is sent, then a Debezium Source
stage would only commit to request the next batch when there's a demand request.
https://debezium.io/documentation/reference/1.0/operations/embedded.html#advanced-consuming
I don't know enough about Debezium to know how failure recovery and/or restarts would work. How do you resume from some last known position? Is the state managed in whatever state store you configure with database.history
?
I think it has to do with offset.storage
properties. There's other thing we have to look up to see if it fits well with Akka Streams. I'm afraid it relies on blocking RPC calls. It's not that big of a problem, but, it has to be very well documented.
It seems like the behavior of RecordCommitter
is just like you said. This way will be easy to implement backpressure.
I'll run some tests myself to be sure about this one, but, looks promising.
Sounds good. Looking forward to your analysis.
As I suspected, Debezium has no asynchronous backpressure mechanism. I opened a PR and commented about it with a possible solution. If it's a viable one or not, I'm not sure.
Hi, I wonder what is the status of this issue, if there is currently a workaround, best practice self implementation and so on
Short description
Implement Debezium connector
Details
Debezium is a Change Data Capture (CDC) kafka connector that integrates with several databases. Although Debezium is a kafka connector, it also works as a standalone tool: https://debezium.io/documentation/reference/1.0/operations/embedded.html
The idea here is to use Debezium as a CDC source. The only issue is understand how backpressure would work here.
I also noticied there's already a PR with a CDC connector for PostgreSQL (#891), but, be able to use CDC with other databases is a nice to have.
As I create this request, the lastest final version supports the following databases:
Apache Camel already has a component for it (actually, there are a few: one for each database). We could check on how they implement it and see if it's a viable solution for Alpakka.