ConduitIO / conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
https://conduit.io
Apache License 2.0
395 stars 46 forks source link

Performance: pipeline rate always lower than the source rate #571

Open hariso opened 2 years ago

hariso commented 2 years ago

Bug description

It appears that the pipeline rate is always lower than the generator source rate. For example, when source generates records at 15k msg/s, the pipeline rate (i.e. the number of records flowing through the pipeline, acknowledged ones) is around 10k, which gives the impression that 10k msg/s is what Conduit can handle.

However, in the 10k msg/s test, the pipeline rate is around 6700 msg/s.

Steps to reproduce

  1. Run https://github.com/ConduitIO/streaming-benchmarks/blob/haris/ec2-helpers/workloads/small-messages-15k-msg-per-sec.sh
  2. Run https://github.com/ConduitIO/streaming-benchmarks/blob/haris/ec2-helpers/workloads/small-messages-10k-msg-per-sec.sh
  3. Compare results

Version

{ "version": "v0.3.0-nightly.20220811", "os": "linux", "arch": "amd64" }

neovintage commented 2 years ago

At a lower rates, the throughput matches what happens from the source side to what's happening inside of conduit. Only when we get to the higher rates (15k or 10k) do we start to see the source connector and conduit start to drift. That drift is usually 30%.

hariso commented 2 years ago

At a lower rates, the throughput matches what happens from the source side to what's happening inside of conduit. Only when we get to the higher rates (15k or 10k) do we start to see the source connector and conduit start to drift. That drift is usually 30%.

As you suggested, I added two more workloads (1k msg/s, and 5k msg/s). For the 1k msg/s, the drift is around 30%, but for the 5k msg/s, it's much bigger.