DarioBalinzo / kafka-connect-elasticsearch-source

Kafka Connect Elasticsearch Source
Apache License 2.0
65 stars 38 forks source link

Allow dups in es queries #110

Closed joncourt closed 8 months ago

joncourt commented 9 months ago

A reasonably hefty rework here to use the newer search_after functionality of elasticsearch with a pit also to allow duplicate keys to be fully loaded (i.e. where perhaps a bulk load has happened with > batch.max.rows having the same update_dt or similar causing tail of that to get lost).

Currently in testing at my place of work ... would be good to get a review from someone more expert in the connector parts of this though please and anything else you'd like to make it contributable.

ahh... one major gotcha. needs ES 7.17.15+ for the search_after feature it has...

DarioBalinzo commented 9 months ago

@joncourt thanks for the submission! It's hard for me now to find some time for this project. I hope to review it in the near future.

joncourt commented 9 months ago

No stress. All good from me. We're testing with it and i'll fix any bugs we find in the meantime.On 18/02/2024, at 11:06 AM, Dario Balinzo @.***> wrote: @joncourt thanks for the submission! It's hard for me now to find some time for this project. I hope to review it in the near future.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

joncourt commented 8 months ago

closing this for now - it's working for me well but i don't have a way to test all the schema parsing because i've set it up to turn it off completely (i.e. dump the whole doc in a string and parse it at the other end of the topic). Was having to fight the parsing quite hard and I don't actually need it.

happy to discuss.