Open bingoohuang opened 3 years ago
要创建topic了,找个规范参考一下吧
<data-center>.<domain>.<classification>.<description>.<version>
examples:
The data center which the data resides in. This is not required, but is helpful when an organization reaches the size where they would like to do an active/active setup or replicate data between data centers. For example, if you have one cluster in AWS and one in Azure, your topics may be prefixed with aws and azure.
A domain for the data is a well understood, permanent name for the area of the system the data relates to. These should not include any product names, team names, or service names.
Examples of this vary wildly between industries. For example, in a transportation organization, some domains might be:
The classification of data within a Kafka topic tells an end-user how it should be interpreted or used. This should not tell us about data format or contents. I typically use the following classifications:
The description is arguably the most important part of the name and is the event name that describes the type of data the topic holds. This is the subject of the data, such as customers, invoices, users, payments, etc.
The version of a topic is often the most forgotten section of a proper topic name. As data evolves within a topic, there may be breaking schema changes or a complete change in the format of the data. By versioning topics, you can allow a transitionary period where consumers can switch to the new data without impacting any old consumers.
By convention, it is preferred to version all topics and to start them at 0.
其它一些图:
Polling strategy is the default strategy of Kafka Java client producer The load balancing performance of the polling strategy is very good. It can always ensure that messages are evenly distributed to all partitions as much as possible. By default, it is the most reasonable partition strategy. The message distribution of the polling strategy is shown in the following figure:
The random strategy defaults fromPartitionChoose one randomly from the list, and the message distribution of the random strategy is roughly as shown in the figure below:
This article assumes that we have a topic named T1, which contains 5 partitions, and then we have two consumers (C1, C2) to consume the data in these 5 partitions, and the num.streams of C1 = 2, C2's num.streams = 1 (where num.streams refers to the number of consumer threads). The Range strategy is for each topic. First, the partitions in the same topic are sorted by serial number, and consumers are sorted in order. In our example, the sorted partitions will be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; the consumer threads sorted will be C1-0, C1-1 , C2-0. Then divide the number of partitions by the total number of consumer threads to determine how many partitions each consumer thread consumes. If the division is not exhausted, the first few consumer threads will consume one more partition. In our example, we have 10 partitions, 3 consumer threads, 10/3 = 3, and inexhaustible, then consumer thread C1-0 will consume one more partition, so the final partition allocation results look It looks like this: C1-0 will consume 0, 3 partitions, C1-1 will consume 1, 4 partitions, and C2-0 will consume 2 partitions. The specific consumption diagram is as follows:
If we add Partition, from the previous 5 partitions to 6 partitions, then the result of the final partition allocation looks like this: C1-0 will consume 0,3 partitions, and C1-1 will consume 1,4 Partition, C2-0 will consume 2,5 partitions, and the final consumption is as follows:
To use the RoundRobin strategy, there are two prerequisites that must be met:
1.The num.streams of all consumers in the same Consumer Group must be equal;
So here assume that num.streams = 2 of the two consumers mentioned earlier. The working principle of the RoundRobin strategy: compose all topic partitions into a TopicAndPartition list, and then sort the TopicAndPartition list according to hashCode. In our example, if the topic-partitions groups sorted by hashCode are T1-5, T1-3, T1-0, T1-2, T1-1, T1-4, and our consumer threads are sorted as C1-0, C1-1, C2-0, C2-1, the result of the final partition allocation is: C1-0 will consume T1-5, T1-1; partition C1-1 will consume T1-3, T1-4, Partition; C2-0 will consume T1-0, partition; C2-1 will consume T1-2, partition; the consumption diagram is as follows:
The guarantee of Kafka's high reliability comes from its robust replication strategy, that is, the replication mechanism of the partition. Kafka will provide multiple replications for each partition and distribute the replications to other Brokers in the entire cluster. The specific replication The number can be set by parameters. The replication here will elect a leader node, and the other nodes are follower nodes. All messages are sent to the leader and then synchronized to the follower node through the synchronization algorithm. When the replication fails to work, the election will be re-elected, even Part of the Broker downtime can still ensure that the entire cluster is highly available and messages are not lost.
Client and broker compatibility across Kafka versions
An overview on client and broker version compatibility.
Maintaining compatibility across different Kafka clients and brokers is a common issue. Mismatches among client and broker versions can occur as part of any of the following scenarios:
In these cases, it is important to understand client/broker compatibility across Kafka versions. Here are general rules that apply:
server.properties
to refer to version A.Determine the Kafka-Client compatibility with kafka-broker
There's a link to the confluent matrix on the Spring for Apache Kafka project page (along with spring-kafka/kafka-clients compatibility).
0.9 is very, very old.
Typically, clients/brokers newer than 0.10.2.0 can talk to each other, but if records have headers, you will need a client >= 0.11.0.0.
Bidirectional Client Compatibility is now supported, you don't need to worry about the compatibility matrix anymore, for KIP-35 enabled clients, any version are good, KIP-35 is released from Broker protocol - 0.10.0, Java clients - 0.10.2
refer:
bin/kafka-server-start.sh <path>/server.properties
bin/kafka-server-start.sh -daemon <path>/server.properties
bin/kafka-server-stop.sh
bin/kafka-topics.sh --create --zookeeper localhost:2181 --partitions 3 --replication-factor 3 --topic topicname
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic topicname
bin/kafka-topics.sh --zookeeper localhost:2181 --list
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic topicname
bin/kafka-topics.sh --alter --zookeeper localhost:2181 --partitions 6 --topic topicname
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group groupname
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group groupname --reset-offsets --all-topics --to-earliest --execute
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group groupname --reset-offsets --all-topics --to-latest --execute
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group groupname --reset-offsets --all-topics --to-offset 2000 --execute
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group groupname --reset-offsets --all-topics --to-datetime 2019-09-15T00:00:00.000
bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --delete --group groupname
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topicname
参数含义:
--compression-codec lz4 压缩类型
--request-required-acks all acks的值
--timeout 3000 linger.ms的值
--message-send-max-retries 10 retries的值
--max-partition-memory-bytes batch.size值复制代码bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topicname --from-beginning
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topicname --from-beginning --consumer-property group.id=old-consumer-group
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topicname --from-beginning --partition 0
kafka-run-class.sh kafka.tools.ConsoleConsumer
就是 kafka-console-consumer.sh
kafka-run-class.sh kafka.tools.ConsoleProducer
就是 kafka-console-producer.sh复制代码kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic topicname --time -1
--time -1表示最大位移 --time -2表示最早位移bin/kafka-simple-consumer-shell.sh --topic _consumer_offsets --partition 12 --broker-list localhost:9092 --formatter "kafka.coorfinator.GroupMetadataManager\$OffsetsMessageFormatter"
bin/kafka-mirror-maker.sh --consumer.config consumer.properties --producer.config producer.properties --whitelist topicA|topicB
$ kafka-consumer-groups.sh --bootstrap-server broker01.example.com:9092 --describe --group flume
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
flume t1 0 1 3 2 test-consumer-group_postamac.local-1456198719410-29ccd54f-0
Why is Kafka fast?
Kafka achieves low latency message delivery through Sequential I/O and Zero Copy Principle. The same techniques are commonly used in many other messaging/streaming platforms.
The diagram below illustrates how the data is transmitted between producer and consumer, and what zero-copy means.
🔹 Step 1.1 - 1.3: Producer writes data to the disk
🔹 Step 2: Consumer reads data without zero-copy 2.1: The data is loaded from disk to OS cache 2.2 The data is copied from OS cache to Kafka application 2.3 Kafka application copies the data into the socket buffer 2.4 The data is copied from socket buffer to network card 2.5 The network card sends data out to the consumer
🔹 Step 3: Consumer reads data with zero-copy 3.1: The data is loaded from disk to OS cache 3.2 OS cache directly copies the data to the network card via sendfile() command 3.3 The network card sends data out to the consumer
Kafka版本选型分析
原则:
当前(2021年03月07日)最新版本 2020年12月21日发布的2.7.0。
Kakfa各个版本发布的日期及ZK、Java的依赖
维基百科 2014年11月,几个曾在领英为Kafka工作的工程师,创建了名为Confluent的新公司,并着眼于Kafka。
公有云支持
阿里云 消息队列 Kafka 版
产品介绍中提到
升级实例帮助文档中提到
腾讯云
消息队列 CKafka 分布式、高吞吐量、高可扩展性的消息服务,100%兼容开源 Apache Kafka 0.9 0.10
在文档消息队列 CKafka产品简介使用限制
AWS Amazon Managed Streaming for Apache Kafka (Amazon MSK)完全托管、高度可用且安全的 Apache Kafka 服务 支持的Kafka Versions
备注
Kafka消息格式的变更
从0.8.x版本开始到现在的2.x版本,Kafka的消息格式也经历了 3 个版本: v0 版本、v1 版本和 v2 版本 。
Kafka对于ZK的使用
参见首页 > 消息队列Kafka版 > 产品简介 > 使用限制
在使用设计层面,Apache Kafka自0.9.0之后已经屏蔽掉ZK,即客户端使用无需访问ZooKeeper。
https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment/
Kafka docker支持
lensesio/box
Lenses Box is a docker image that provides a full installation of Apache Kafka with all relevant components.
https://github.com/lensesio/fast-data-dev