Open HarshadRanganathan opened 2 years ago
max.poll.records
max.partition.fetch.bytes
exactly once semantics - kafka streams or idempotence ordering - topic with one partition or use same key or custom partitioner data loss - OOM - auto-commit transactional.id isolation.level
offsets.retention.minutes
group.id group.instance.id fetch.max.wait.ms fetch.min.bytes fetch.max.bytes (used by broker) max.partition.fetch.bytes auto.commit.interval.ms enable.auto.commit acks transactional.id isolation.level heartbeat.interval.ms session.timeout.ms auto.offset.reset max.poll.interval.ms max.poll.records (used by consumer - buffer) buffer.memory log.cleaner.min.cleanable.ratio
enable.idempotence max.in.flight.requests.per.connection
compression.type recommend lz4 strongly recommend not to use gzip since it's compute intensive
compression is applied on full batches of data -> so better batching results in better compression ratio keep compression code between producer and destination topic same as possible so we don't need to recompress data
linger.ms batch.size
Larger batches result in fewer requests - reduce load on producers as well as broker CPU overhead to process each request
tradeoff - tolerate higher latency
max.poll.records max.partition.fetch.bytes
max-poll-interval-ms (time consumer can be idle before fetching more records) session.timeout.ms
When a batch takes too long to process or when GC pause takes too long - increase upper bound on the amount of time that a consumer can be idle before fetching more records with max.poll.interval.ms or by reducing the maximum size of batches returned with the configuration parameter max.poll.records
log.retention.ms -> retention by time log.retention.bytes -> retention by size
min.insync.replicas log.cleanup.policy log.cleaner.delete.retention.ms log.cleaner.min.compaction.lag.ms
max.message.bytes (topic level) replica.fetch.max.bytes (broker) max.partition.fetch.bytes (consumer) max.request.size (producer)
partitioner.class (producer) partition.assignment.strategy (consumer)
Replication factor - 3 for durability Kafka streams - changelog + repartition topics for state stores - increase to 3
client.dns.lookup=use_all_dns_ips (connect to all ips before failure)
trade-offs on increasing the number of partitions -
[1] choose the partition count based on the producer and consumer throughput [2] ensure messages are distributed evenly across topic partitions
if there are a lot of partitions, tune buffer.memory - also take into account message size, linger time, partition count - to maintain pipelines across more partitions - better utilization of bandwidth
increased number of partitions -> increase latency it takes longer to replicate a lot of partitions longer for messages to be committed -> msg can't be consumed until it's committed
Producer:
batch.size - 100000 to 200000 linger.ms - 10-100 compression.type - lz4 acks - 1 buffer.memory - increase if there are lot of partitions
Consumer:
fetch.min.bytes - ~100000
Producer:
linger.ms - 0 compression.type - none acks - 1
Consumer:
fetch.min.bytes - 1
Streams:
StreamsConfig.TOPOLOGY_OPTIMIZATION: StreamsConfig.OPTIMIZE
Producer:
RF - 3 ack - all enable.idempotence - true (prevent duplication and out-of-order msgs) max.in.flight.requests.per.connection - 1 (prevent out of order when not using idempotent producer)
Consumer:
enable.auto.commit - false isolation.level - read_committed (EOS transactions)
Streams:
RF: 3 PROCESSING_GUARANTEE_CONFIG - EXACTLY_ONCE
Consumer:
session.timeout.ms - increase (take account of potential network delays and avoid soft failures)
Streams:
NUM_STANDBY_REPLICAS_CONFIG - 1 or more
Patterns:
Kafka connect to make remote databases available local to kafka. Then leverage kafka streams or ksql to perform fast and efficient joins of tables and streams
Msg Size Increase - https://www.datadoghq.com/blog/kafka-at-datadog/
offsets.retention.minutes
https://dzone.com/articles/apache-kafka-consumer-group-offset-retention
Log compaction settings
https://developer.aiven.io/docs/products/kafka/howto/configure-log-cleaner https://www.conduktor.io/kafka/kafka-topic-configuration-log-compaction https://medium.com/@sunny_81705/kafka-log-retention-and-cleanup-policies-c8d9cb7e09f8
spring.kafka.consumer.heartbeat.interval.ms=20000 spring.kafka.consumer.session.timeout.ms=30000 spring.kafka.consumer.properties.max.poll.interval.ms=600000 spring.kafka.consumer.max-poll-records=35
max_poll_interval_ms => "2147483647" max_poll_records => "100" request_timeout_ms => "50000" session_timeout_ms => "40000"
heartbeat.interval.ms and session.timeout.ms and/or increase max.poll.interval.ms and decrease max.poll.records
https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/CooperativeStickyAssignor.html