HarshadRanganathan commented 2 years ago

offsets.retention.minutes

https://dzone.com/articles/apache-kafka-consumer-group-offset-retention

Log compaction settings

https://developer.aiven.io/docs/products/kafka/howto/configure-log-cleaner https://www.conduktor.io/kafka/kafka-topic-configuration-log-compaction https://medium.com/@sunny_81705/kafka-log-retention-and-cleanup-policies-c8d9cb7e09f8

spring.kafka.consumer.heartbeat.interval.ms=20000 spring.kafka.consumer.session.timeout.ms=30000 spring.kafka.consumer.properties.max.poll.interval.ms=600000 spring.kafka.consumer.max-poll-records=35

max_poll_interval_ms => "2147483647" max_poll_records => "100" request_timeout_ms => "50000" session_timeout_ms => "40000"

heartbeat.interval.ms and session.timeout.ms and/or increase max.poll.interval.ms and decrease max.poll.records

https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/CooperativeStickyAssignor.html

HarshadRanganathan commented 2 years ago

max.poll.records

max.partition.fetch.bytes

HarshadRanganathan commented 2 years ago

exactly once semantics - kafka streams or idempotence ordering - topic with one partition or use same key or custom partitioner data loss - OOM - auto-commit transactional.id isolation.level

HarshadRanganathan commented 2 years ago

offsets.retention.minutes

HarshadRanganathan commented 2 years ago

group.id group.instance.id fetch.max.wait.ms fetch.min.bytes fetch.max.bytes (used by broker) max.partition.fetch.bytes auto.commit.interval.ms enable.auto.commit acks transactional.id isolation.level heartbeat.interval.ms session.timeout.ms auto.offset.reset max.poll.interval.ms max.poll.records (used by consumer - buffer) buffer.memory log.cleaner.min.cleanable.ratio

Duplication & Ordering

enable.idempotence max.in.flight.requests.per.connection

compression

compression.type recommend lz4 strongly recommend not to use gzip since it's compute intensive

compression is applied on full batches of data -> so better batching results in better compression ratio keep compression code between producer and destination topic same as possible so we don't need to recompress data

optimize throughput

linger.ms batch.size

Larger batches result in fewer requests - reduce load on producers as well as broker CPU overhead to process each request

tradeoff - tolerate higher latency

lag issues

max.poll.records max.partition.fetch.bytes

rebalance issues

max-poll-interval-ms (time consumer can be idle before fetching more records) session.timeout.ms

When a batch takes too long to process or when GC pause takes too long - increase upper bound on the amount of time that a consumer can be idle before fetching more records with max.poll.interval.ms or by reducing the maximum size of batches returned with the configuration parameter max.poll.records

HarshadRanganathan commented 2 years ago

Options to delete log segments

log.retention.ms -> retention by time log.retention.bytes -> retention by size

min.insync.replicas log.cleanup.policy log.cleaner.delete.retention.ms log.cleaner.min.compaction.lag.ms

HarshadRanganathan commented 2 years ago

max.message.bytes (topic level) replica.fetch.max.bytes (broker) max.partition.fetch.bytes (consumer) max.request.size (producer)

HarshadRanganathan commented 2 years ago

partitioner.class (producer) partition.assignment.strategy (consumer)

HarshadRanganathan commented 2 years ago

Replication factor - 3 for durability Kafka streams - changelog + repartition topics for state stores - increase to 3

HarshadRanganathan commented 2 years ago

client.dns.lookup=use_all_dns_ips (connect to all ips before failure)

HarshadRanganathan commented 2 years ago

trade-offs on increasing the number of partitions -

[1] choose the partition count based on the producer and consumer throughput [2] ensure messages are distributed evenly across topic partitions

if there are a lot of partitions, tune buffer.memory - also take into account message size, linger time, partition count - to maintain pipelines across more partitions - better utilization of bandwidth

increased number of partitions -> increase latency it takes longer to replicate a lot of partitions longer for messages to be committed -> msg can't be consumed until it's committed

HarshadRanganathan commented 2 years ago

Recommended configurations for Optimizing Throughput

Producer:

batch.size - 100000 to 200000 linger.ms - 10-100 compression.type - lz4 acks - 1 buffer.memory - increase if there are lot of partitions

Consumer:

fetch.min.bytes - ~100000

Recommended configurations for Optimizing Latency

Producer:

linger.ms - 0 compression.type - none acks - 1

Consumer:

fetch.min.bytes - 1

Streams:

StreamsConfig.TOPOLOGY_OPTIMIZATION: StreamsConfig.OPTIMIZE

Recommended configurations for Optimizing Durability

Producer:

RF - 3 ack - all enable.idempotence - true (prevent duplication and out-of-order msgs) max.in.flight.requests.per.connection - 1 (prevent out of order when not using idempotent producer)

Consumer:

enable.auto.commit - false isolation.level - read_committed (EOS transactions)

Streams:

RF: 3 PROCESSING_GUARANTEE_CONFIG - EXACTLY_ONCE

Recommended configurations for Optimizing Availability

Consumer:

session.timeout.ms - increase (take account of potential network delays and avoid soft failures)

Streams:

NUM_STANDBY_REPLICAS_CONFIG - 1 or more

HarshadRanganathan commented 2 years ago

Patterns:

Kafka connect to make remote databases available local to kafka. Then leverage kafka streams or ksql to perform fast and efficient joins of tables and streams

HarshadRanganathan commented 2 years ago

https://www.conduktor.io/kafka/kafka-topic-configuration-log-compaction

HarshadRanganathan commented 1 year ago

https://forum.confluent.io/t/unable-to-reset-offset-for-a-single-partition/6240

HarshadRanganathan commented 1 year ago

Msg Size Increase - https://www.datadoghq.com/blog/kafka-at-datadog/

HarshadRanganathan commented 1 year ago