HarshadRanganathan / harshadranganathan.github.io

Personal Website
MIT License
0 stars 2 forks source link

Kafka #59

Open HarshadRanganathan opened 2 years ago

HarshadRanganathan commented 2 years ago

Kafka:

https://strimzi.io/blog/2021/12/17/kafka-segment-retention/ https://www.confluent.io/blog/5-common-pitfalls-when-using-apache-kafka/

Streams:

https://medium.com/lydtech-consulting/kafka-streams-introduction-d7e5421feb1b

https://blog.rockthejvm.com/kafka-streams/

Optimization:

https://developers.redhat.com/articles/2022/05/03/fine-tune-kafka-performance-kafka-optimization-theorem#optimization_goals_for_kafka

https://strimzi.io/blog/2021/06/08/broker-tuning/

https://strimzi.io/blog/2021/01/07/consumer-tuning/

https://strimzi.io/blog/2020/10/15/producer-tuning/

https://medium.com/paypal-tech/kafka-consumer-benchmarking-c726fbe4000

https://medium.com/bigpanda-engineering/sleeping-good-at-night-kafka-configurations-tweaks-6dd4d3aaf4e5

https://www.conduktor.io/kafka/kafka-advanced-concepts

Operations:

https://strimzi.io/blog/2020/06/15/cruise-control/

Logstash:

https://discuss.elastic.co/t/multiple-logstash-reading-from-a-single-kafka-topic/27727

HarshadRanganathan commented 2 years ago

https://www.instaclustr.com/blog/two-considerations-for-kafka-topic-replication/

HarshadRanganathan commented 2 years ago

https://docs.cloudera.com/runtime/7.2.10/kafka-managing/topics/kafka-manage-cli-perf-test.html

HarshadRanganathan commented 1 year ago

https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/

HarshadRanganathan commented 1 year ago

https://mikemybytes.com/2022/07/11/json-kafka-and-the-need-for-schema/

HarshadRanganathan commented 1 year ago

Batch size config is not the number of records within the batch but the size in bytes.

The definition of batch size differs for connectors. I.e azure function sink connector batch.size equals to number of records.

batch.size in mongodb connector defines cursor batch size (numbers or events to return from mongodb)

HarshadRanganathan commented 1 year ago

potential data loss for all consumer groups when increasing partitions using auto.offset.reset = latest

HarshadRanganathan commented 1 year ago
HarshadRanganathan commented 1 year ago

Kafka brokers have a partition count limit, even if those partitions have no active traffic. We had assumed inactive partitions were free, but that’s not the case. Each partition has a CPU cost on the broker.

We spent a long long time trying to run a cluster with a high partition count but low throughout with a few brokers as possible. It was super unstable until we finally scaled the cluster horizontally.

HarshadRanganathan commented 1 year ago

use constants (for config, but also app name, group id, …), do not let auto topic creation in prod, TopologyTestDriver, don’t increase the nb. of partitions

HarshadRanganathan commented 1 year ago

Default batch.size and [http://linger.ms] values for producers are probably too low. Increasing them could save tens of thousands of dollars in Kafka infra cost.

HarshadRanganathan commented 1 year ago
  1. Have a topic naming and creation strategy
  2. Use well defined groupIds that correlate to the apps
  3. Avoid dual writes.. DB & Kafka
  4. Adopt SRP
  5. Use a Schema Registry
  6. Focus on Monitoring & Alerting
  7. Build in app robustness early, upgrades, repartitioning, kill brokers
HarshadRanganathan commented 1 year ago

Data skew caused by unbalanced message keys is really hard to fix. Think twice about the keys you chose (it's worth profiling them and gathering statistics).

HarshadRanganathan commented 1 year ago
  1. When you are starting out, use the Producer and Consumer APIs. Learn those well and then use a framework (Kafka Streams, Spring Kafka, etc) at a later stage.
  2. The partition key matters a lot, so thinking about your key's distribution can avoid performance problems.
HarshadRanganathan commented 1 year ago

Partition rebalancing will bite you hard. Plan for it before it's too late.

HarshadRanganathan commented 1 year ago

https://forum.confluent.io/t/partitioning-gotchas-dont-use-avro-json-or-protobuf-for-keys-and-be-aware-of-client-hashing-differences/2718

HarshadRanganathan commented 1 year ago

https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/

HarshadRanganathan commented 1 year ago

https://www.confluent.io/blog/apache-kafka-ci-cd-with-github/

HarshadRanganathan commented 1 year ago

https://cwiki.apache.org/confluence/display/KAFKA/KIP-578%3A+Add+configuration+to+limit+number+of+partitions

HarshadRanganathan commented 1 year ago

https://www.confluent.io/blog/kafka-streams-vs-ksqldb-compared/

HarshadRanganathan commented 1 year ago

https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-acid-storage-transactions-sql-nosql-data-lake/

HarshadRanganathan commented 1 year ago

https://newrelic.com/blog/how-to-relic/distributed-tracing-with-kafka

HarshadRanganathan commented 1 year ago

https://jaceklaskowski.gitbooks.io/apache-kafka/content/

HarshadRanganathan commented 1 year ago

https://felipevolpone.medium.com/consuming-over-1-billion-kafka-messages-per-day-at-ifood-2465e1ffa795

HarshadRanganathan commented 1 year ago

https://sixfold.medium.com/bringing-kafka-based-architecture-to-the-next-level-using-simple-postgresql-tables-415f1ff6076d

HarshadRanganathan commented 1 year ago

https://rockset.com/blog/kafka-vs-kinesis-choosing-the-best-data-streaming-solution/

https://www.kai-waehner.de/blog/2022/08/18/why-doordash-migrated-from-cloud-native-amazon-sqs-and-kinesis-to-apache-kafka-and-flink/

HarshadRanganathan commented 1 year ago

https://medium.com/@hardiktaneja_99752/lessons-after-running-kafka-in-production-626974ffd700

HarshadRanganathan commented 1 year ago

Event Sourcing:

https://itnext.io/event-sourcing-why-kafka-is-not-suitable-as-an-event-store-796e5d9ab63c

https://medium.com/dna-technology/why-we-dropped-event-sourcing-with-kafka-streams-when-given-a-second-chance-b904a80bc4be

https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/

HarshadRanganathan commented 1 year ago

Kafka Consumer/Producer Failures:

[1] Deserialization errors [2] Rebalance issues [3] NPE [4] Dead Letter topic [5] Poison pills

HarshadRanganathan commented 1 year ago

https://irori.se/blog/dealing-with-large-messages-in-kafka/

HarshadRanganathan commented 1 year ago

https://aiven.io/blog/balance-data-across-kafka-partitions

HarshadRanganathan commented 1 year ago

https://medium.com/@Irori/dangerous-default-kafka-settings-part-1-2ee99ee7dfe5

HarshadRanganathan commented 1 year ago

Kafka issues:

[1] If events for same key are published within few/same ms - then the order is not predicatable

[2] race condition issue - what is two processes are reading from same topic and updating same row in relational table - which one gets done first?

HarshadRanganathan commented 1 year ago

https://www.confluent.io/blog/debug-apache-kafka-pt-3/

HarshadRanganathan commented 1 year ago

https://dzone.com/articles/kafka-streams-tips-on-how-to-decrease-rebalancing

HarshadRanganathan commented 1 year ago

https://www.linkedin.com/pulse/avoiding-message-losses-duplication-lost-multiple-kafka-mahesh-abnave/

HarshadRanganathan commented 1 year ago

https://betterprogramming.pub/kafka-acks-explained-c0515b3b707e

HarshadRanganathan commented 1 year ago

https://medium.com/lydtech-consulting/kafka-consumer-auto-offset-reset-d3962bad2665

HarshadRanganathan commented 1 year ago

Kafka Streams Behavior:

Semantics

KStream KTable GlobalKTable
Insert/append-only Update Populate data from all partitions of the topic
Enabling log compaction will affect the semantics of data Enable log compaction to save space

Timestamps

Whenever a Kafka Streams application writes records to Kafka, then it will also assign timestamps to these new records

Ordering

If two producers write to the same topic partition, there is no guarantee on the event append order.

Processing Guarantees

At least once by default

Exactly Once

When publishing a record with exactly-once semantics enabled, a write is not considered successful until it is acknowledged, and a commit is made to “finalize” the write

With exactly-once, multiple records are grouped into a single transaction, and so either all or none of the records are committed.

In the “read_committed” isolation level, the consumer will only return records from transactions that were committed, and any records that were not part of a transaction.

HarshadRanganathan commented 1 year ago

Kafka Consumer/Producer Behavior:

Batching

Records are batched at each partition level

Records larger than batch size won't be batched

Batch size -

Compression

ACK/Min-ISR

In summary, when acks=all with a replication.factor=N and min.insync.replicas=M we can tolerate N-M brokers going down for topic availability purposes

acks=all and min.insync.replicas=2 is the most popular option for data durability and availability and allows you to withstand at most the loss of one Kafka broker

However, if two out of three replicas are not available, the brokers will no longer accept produce requests. Instead, producers that attempt to send data will receive 'NotEnoughReplicasException'.

Replication

Auto Commit

Retries

Idempotence (Per Partition)

The producer send operation is now idempotent. In the event of an error that causes a producer retry, the same message—which is still sent by the producer multiple times—will only be written to the Kafka log on the broker once.

Each batch of messages sent to Kafka will contain a sequence number that the broker will use to dedupe any duplicate send.

Log Compaction

Auto Offset Reset

The decision on whether to consume from the beginning of a topic partition or to only consume new messages when there is no initial offset for the consumer group is controlled by the auto.offset.reset configuration

auto.offset.reset=earliest

auto.offset.reset=latest

auto.offset.reset=none

Message Size

Retention

To specify retention by time, we have to set

Expire messages is based on the total number of bytes of messages retained

HarshadRanganathan commented 1 year ago

https://www.confluent.io/blog/enabling-exactly-once-kafka-streams/ https://www.confluent.io/blog/transactions-apache-kafka/

HarshadRanganathan commented 1 year ago

https://medium.com/fintechexplained/12-best-practices-for-using-kafka-in-your-architecture-a9d215e222e3

https://medium.com/swlh/choosing-right-partition-count-replication-factor-apache-kafka-cf50b1bc75cf

HarshadRanganathan commented 1 year ago

https://medium.com/trendyol-tech/how-to-implement-retry-logic-with-spring-kafka-710b51501ce2

HarshadRanganathan commented 1 year ago

https://medium.com/@bb8s/kafka-producer-deep-dive-partition-assignment-846dcc366689

https://medium.com/@bb8s/kafka-producer-deep-dive-batching-messages-in-recordaccumulator-aeaf5905fee

HarshadRanganathan commented 1 year ago

https://medium.com/bakdata/processing-large-messages-with-kafka-streams-167a166ca38b

HarshadRanganathan commented 1 year ago

https://codeburst.io/combining-strict-order-with-massive-parallelism-using-kafka-83dc1ec9be03

HarshadRanganathan commented 1 year ago

https://itnext.io/kafka-gotchas-24b51cc8d44e

HarshadRanganathan commented 1 year ago

https://medium.com/coralogix-engineering/kafka-consumer-issues-fixing-jvm-garbage-collection-problems-a2655efe8328

HarshadRanganathan commented 1 year ago

https://medium.com/bakdata/optimizing-kafka-streams-apps-on-kubernetes-by-splitting-topologies-ac6b4c90516e

HarshadRanganathan commented 1 year ago

https://medium.com/event-driven-utopia/operational-use-case-patterns-for-apache-kafka-and-flink-part-1-5a0f8742df90

HarshadRanganathan commented 1 year ago

https://udayabharathi.medium.com/message-prioritization-in-kafka-105f712dcf8a