MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

storage/sources/kafka: Materialize creates new librdkafka context per Timely worker and source #7838

Open antiguru opened 3 years ago

antiguru commented 3 years ago

What version of Materialize are you using?

Current main: https://github.com/MaterializeInc/materialize/commit/c793609b1944e33362c10a7f6b3756bcc643e242

How did you install Materialize?

What was the issue?

Each Timely worker creates a separate librdkafka instance for each source. Each librdkafka instance creates several threads, a main thread and roughly two broker threads. This causes a large number of threads to be created.

Librdkfaka claims to provide a thread-safe API, so in theory we should only need a single librdkafka instance per process and distinct configuration.

Is the issue reproducible? If so, please provide reproduction instructions.

Using testdrive, run an example that uses Kafka sources and observe the threads created, for example using GDB.

$ cargo run --release --bin materialized -- --workers 32
$ pgrep materialized
$ rust-gdb target/release/materialized
> attach $PID_OF_MZ
> c
$ cargo run --bin testdrive -- test/testdrive/*.td

Observe thread creation/exit events.

benesch commented 3 years ago

Yeah, it’s real gross at the moment. Lots of historical context around that roughly starts with https://github.com/MaterializeInc/materialize/issues/375.

sploiselle commented 1 year ago

Is this still an issue?

benesch commented 1 year ago

Indeed it is! I wrote a design document for this a few years back that never got implemented: https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/20210413_source_sink_resource_sharing.md.