fede1024 / kafka-benchmark

A tool to run benchmarks on Kafka clusters
MIT License
90 stars 28 forks source link

Enable compression on producer #4

Closed jobetdelima closed 6 years ago

jobetdelima commented 6 years ago

Hello,

I wanted to try enabling compression on the producer, so I added "compression.type: snappy" in the config file, like below:

msg_bursts_base_snappy: repeat_times: 5 repeat_pause: 10 topic: test_snappy_topic message_size: 4000 message_count: 100000 threads: 6 producer: BaseProducer producer_config: bootstrap.servers: localhost:9092 queue.buffering.max.messages: 1000000 queue.buffering.max.ms: 100 compression.type: snappy

Unfortunately, I ended up with the below error: kafka-benchmark --config producer_benchmark_config.yaml --scenario msg_bursts_base_snappy Scenario: msg_bursts_base_snappy, repeat 5 times, 10s pause after each thread 'thread 'thread '' panicked at '' panicked at 'Producer creation failed: KafkaError (Client config error: No such configuration property: "compression.type" compression.type snappy)Producer creation failed: KafkaError (Client config error: No such configuration property: "compression.type" compression.type snappy)' panicked at '', ', Producer creation failed: KafkaError (Client config error: No such configuration property: "compression.type" compression.type snappy)src/libcore/result.rssrc/libcore/result.rs', :thread 'src/libcore/result.rs:906:906:906:' panicked at '4:4Producer creation failed: KafkaError (Client config error: No such configuration property: "compression.type" compression.type snappy)4

Is there a way for me to enable compression?

Thanks!

fede1024 commented 6 years ago

The correct name is compression.codec (see rdkafka docs or librdkafka confguration parameters).

I'll add a link to those pages in the README.

jobetdelima commented 6 years ago

Thanks! Let me try that. I just got it from https://kafka.apache.org/documentation/ under the "producer Config" section. I guess that's outdated?

fede1024 commented 6 years ago

That's the Java producer documentation. The Rust producer, which is based on librdkafka, has different configuration parameters.

jobetdelima commented 6 years ago

Thanks. BTW, is there a scenario set-up that would allow me to submit messages in batches? It doesn't look like enabling compression is affecting the metrics but I'm thinking perhaps it's because the messages are being sent one-by-one right away instead of in batches.

fede1024 commented 6 years ago

Messages are sent in batches by default. See: https://fede1024.github.io/rust-rdkafka/rdkafka/producer/index.html#buffering

What configuration are you using? For higher throughput i recommend something like:

producer_config:
  queue.buffering.max.messages: 1000000
  queue.buffering.max.ms: 100
jobetdelima commented 6 years ago

I see. I'm using the below config:

msg_bursts_base:
    repeat_times: 5
    repeat_pause: 10
    topic: test_topic
    message_size: 4000
    message_count: 100000
    threads: 6
    producer: BaseProducer
    producer_config:
      bootstrap.servers: localhost:9092
      queue.buffering.max.messages: 1000000
      queue.buffering.max.ms: 100

And saw almost the same metrics when I added compression.codec: snappy.

I just changed queue.buffering.max.ms from 100 to 0, and now I do see that enabling compression affects the metrics.

It's interesting that setting it to 100 seems to produce the best throughput.

Thanks for your help!

fede1024 commented 6 years ago

I suspect that compression doesn't have an effect because you are producing to localhost, so data transmission is not a bottleneck. In that case, compressing data might actually slow down data production. I suggest testing it on a server running on a separate machine.