Closed olis1996 closed 1 year ago
This might be related to the implementation of the KafkaStreamsAdmin.inspect method and is certainly confusing for users.
Are you using a Kafka topic with multiple partitions, where some partitions contain no or only a few records?
There are 3 partitions, which might happen to contain very different amounts of records
When we retrieve n
sample records from a topic with m
partitions, we currently try to consume n/m
records from each partition.
If any of the partitions holds less than n/m
records, this might lead to the unexpected situation where the /streams/:uuid/inspect
endpoint returns less records than requested, even if the topic holds >= n
records across all partitions.
I propose to change the sampling approach such that we always return n
records if the topic holds >= n
records across all partitions, regardless of any skew.
@olis1996 For now, it might be useful to create a topic with a single partition to manually test your data-generator.
We will change the Stream/Inspect to offer two modes of operation, which can be switched with a variable flag
The first mode will be the new default and will be to retrieve the messages top-down. We will try and get all messages from the first partition, if that set doesn't contain x
, we will continue with the next partition etc.
The second mode will be the current implementation where we try and evenly spread the amount retrieved accross all partitions.
This has been fixedwith PR #158
Description: Inside the pipeline designer the wrong number of records is displayed.
Steps to reproduce: 1.) Load 200 records into the kafka topic. 2.) Go to the pipeline designer. 3.) Per default 100 sample records are being displayed. 4.) Open sidebar and set sample size to 200. 5.) Save settings and close sidebar. => 186 records are being displayed
6.) Set sample size to anything >= 241 => 200 records are being displayed correctly