confluentinc / librdkafka

The Apache Kafka C/C++ library
Other
303 stars 3.15k forks source link

Topic {X} partition count changed from {Y} to {Z} - for already deleted topic #4823

Open plachor opened 3 months ago

plachor commented 3 months ago

Description

Within my integration test I have encountered situation in which deleted topics where being recreated without reason by producer instance (messages where not being produced anymore to those already deleted topics). I'm relaying on latest confluent-kafka-dotnet v2.5.2, writing here since log looks like originating from Librdkafka

How to reproduce

1) I am creating a long living producer which will be used during test (with settings: Acks.All, MaxInFlight = 1 and EnableDeliveryReports = true) 2) I am creating a batch of topics with large number of partitions (for instance 4 topics each with ~200 partitions) 3) In parallel I am producing to these topics many messages (my goal is to cover all partitions) and await every produce 4) Once I complete my assertions I remove those topics and prepare new batch of topics to move forward (with smaller count of partitions) 5) I reuse same producer instance against new batch of topics and do same until my last topic has only 1 partition

On stderr I observe logs like:

%5|1724155743.093|PARTCNT|A.ff694163-3c88-42d0-885d-95129e189b26#producer-947| [thrd:main]: Topic beb73ee463ae43edb25576bf83f48c59_Partitions_200 partition count changed from 200 to 9

If I dispose producer instance and create it per each batch this is not occurring.

Since I await each produce request I believe that it happens when producer asks for metadata for already deleted topics due to broker setting auto.create.topics.enable = true. That it is creating them with broker defaults in my case 9 partitions.

Is it correct and is it a bug or desired behavior?

It is almost always occurring locally if topics are above 80 partitions, for lower count of partitions it is not always occurring. Bellow 50 partitions I have not spotted it.

Additionally I tested against version of LibrdKafka 2.3.0 and behavior is same

./kafka-topics.sh --describe --topic 5d1e281c686641e2bfba8950d56285dd_Partitions_198 --bootstrap-server localhost:9092

Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 TopicId: fLaEeO4MRyy36Yt7UZ-EJA PartitionCount: 9 ReplicationFactor: 1 Configs: min.insync.replicas=1,segment.bytes=1073741824,index.interval.bytes=4096,segment.index.bytes=262144 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 0 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 1 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 2 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 3 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 4 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 5 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 6 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 7 Leader: 1 Replicas: 1 Isr: 1 Topic: 5d1e281c686641e2bfba8950d56285dd_Partitions_198 Partition: 8 Leader: 1 Replicas: 1 Isr: 1

kafka-console-consumer.sh --from-beginning \ --bootstrap-server localhost:9092 --property print.key=true \ --property print.value=false --property print.partition \ --topic 5d1e281c686641e2bfba8950d56285dd_Partitions_198 --timeout-ms 5000 | tail -n 10|grep "Processed a total of"

[2024-08-21 12:21:32,164] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) org.apache.kafka.common.errors.TimeoutException Processed a total of 0 messages

So these recreated topics are empty.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

emasab commented 1 month ago

Once I complete my assertions I remove those topics and prepare new batch of topics to move forward (with smaller count of partitions)

Do all the topics in the new batch have different names than the old batch, or they have the same names?

If they have the same names, this is normal as even if deleted the topic is present in metadata cache until it's detected it was deleted, from metadata response, and it's possible it's detected directly as a partition count change.

plachor commented 1 month ago

Each batch has different names and this not issue with metadata as shown in example this topic: 1) 5d1e281c686641e2bfba8950d56285dd_Partitions_198 initially was created with 198 partitions 2) than after it was used and test it was deleted 3) eventually I see it was recreated with default (9) numer of parttions like seen in log from librd