databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

DO NOT MERGE: Refactor plan #230

Closed fqtab closed 7 months ago

fqtab commented 7 months ago

This DRAFT is just to give interested stakeholders an idea of where/how I'm planning to refactor things to enable:

The plan is to break this up into smaller PRs with wayyyy better code (not to mention better test coverage) Also, there are breaking changes as well as unnecessary changes in this PR (because this is just a POC).

The main downside of this refactor is we create a TaskWriter per topic partition per table per in a given Task now. Previously, we only create a TaskWriter per table in given Task. This is necessary to enable producer generation based fencing. However this likely means higher memory usage (think particularly about the table fanout usecase).

PRs planned:

  1. Write some important tests to avoid breaking changes
  2. Refactor existing code to conform to new interfaces (Writer, Committer)
  3. Add zombie fencing to Committer implementation
  4. Add a new Committer that allows for separate control cluster
tabmatfournier commented 7 months ago

We are already memory hungry. This would be very memory hungry in the cases where someone is fanning out to multiple tables at once and have a lot of partitions. Wouldn't be surprised if you 2x-5xd the memory usage in some cases.

bryanck commented 7 months ago

We are already memory hungry. This would be very memory hungry in the cases where someone is fanning out to multiple tables at once and have a lot of partitions. Wouldn't be surprised if you 2x-5xd the memory usage in some cases.

+1, more concerning to me is moving to a model were we must create a file per partition per commit, which will result in many more files created than needed in many cases. This can have a negative impact on query planning performance, scan performance, storage size, and so on. Aggressive compaction can help mitigate that to some extent but that can drive up compute and storage costs also.

At my current company we will often partition a topic for future scalability using a highly composite number of partitions, often several thousand. This allows us to scale the number of processing tasks without introducing skew. I feel it is too limiting not being able to coalesce partitions when writing.

fqtab commented 7 months ago

Cool, I did call this out explicitly as a potential issue in the PR description but admittedly I did not fully consider the impact of this. Appreciate the clear feedback and agree it doesn't make sense to move forward with the topic-partition-level processing model. Taken the rest of the useful changes from this PR and created a new PR.