abraham-leal / kafka-idle-topics

A tool to detect idle topics in your cluster
Apache License 2.0
19 stars 2 forks source link

Connectivity Issue and troubleshooting #14

Open satpalgrewal opened 6 months ago

satpalgrewal commented 6 months ago

Hi Abraham

I am testing the tool and having issues which may be network related but need some way of confirming this

We have a private cluster which is connected to confluent via AWS private links and on-prem AWS Transit gateway and to laptop via VPN.

Command: ~/go/bin/kafka-idle-topics -bootstrap-servers ????.eu-west-1.aws.glb.confluent.cloud:9092 -username ??? -password ?? -kafkaSecurity plain_tls -idleMinutes 43800

Outputs - Attempt 1 2024/01/26 14:25:37 Loading Topics... 2024/01/26 14:25:37 Could not reach cluster within the last 30 seconds. Is the configuration correct? read tcp 172.31.193.152:59524->10.14.163.147:9092: read: connection reset by peer

Output - Attempt 2 2024/01/26 15:47:31 Loading Topics... 2024/01/26 15:47:34 Evaluating Topics that haven't been produced to since... 2023-12-27 05:47:34.83183 +0000 GMT m=-2627996.097033499 2024/01/26 15:47:35 Could not consume from topic: read tcp 172.31.193.152:49521->10.14.163.179:9092: read: connection reset by peer

From the same laptop I can confirm I can consume messages from a topic using the command kafka-console-consumer --bootstrap-server??.eu-west-1.aws.glb.confluent.cloud:9092 --consumer.config command-nonprod.properties --topic ??????--from-beginning

Any suggests ?

abraham-leal commented 6 months ago

Unfortunately, this does sound like a networking issue, not much I can do to help that. I'd recommend running the tool as close to the cluster here to avoid unnecessary network hops.

SandeepSehra commented 6 months ago

Hi Abraham ,

We have managed to find a temporary solution regarding the network issues that satpal posted , but currently we are testing the tool against the new parameters that you have introduced -allowList & -disallowList and whenever we ran it against the prefix for both public and private topics that we wanted to test against , but it doesn't show anything and outputs 0 to be removed .

Our naming conventions for public topics are sainsburys.data. and for private we have sainsburys.applications. / sainsburys.teams* so we want to capture a list for both separately .

The Command i have been using : ~/go/bin/kafka-idle-topics -bootstrap-servers lkc-5vq7q-41jq3.eu-west-1.aws.glb.confluent.cloud:9092 -username ????? -password ???? -kafkaSecurity plain_tls -idleMinutes 43800 -allowList 'sainsburys.teams.'

The Output i get : 2024/02/08 15:28:30 Loading Topics... 2024/02/08 15:28:35 Evaluating Topics that haven't been produced to since... 2024-01-09 05:28:35.745909 +0000 GMT m=-2627992.892874499 2024/02/08 15:28:36 Evaluating Topics without active Consumer Groups... 2024/02/08 15:31:05 Evaluating Topics without anything in them... 2024/02/08 15:31:05 Done! You can delete 0 topics and 0 partitions! A list of found idle topics is available at: /~idleTopics.txt

Any suggestions of what we can do , to be able to run the test against both public and private topics .

abraham-leal commented 6 months ago

Hi @SandeepSehra, allowlist/disallow lists are literal lists, and do not allow prefixes. If you are hoping to filter out those topics, you have to list out their literal names.

SandeepSehra commented 6 months ago

@abraham-leal thanks for that we managed to use the -hideTopicsPrefixes parameter instead for the above scenario in which we had another question of how back can we track the topics being idle for as currently we are using this parameter -idleMinutes and set the value around 1 month , in which can we set this as 3 months for an example or further .

abraham-leal commented 6 months ago

@SandeepSehra Glad -hideTopicsPrefixes helped your usecase. I built it in for something different but its nice to see it evolve :) It should not be a problem to use -idleMinutes for that length of time. The parameter uses the minutes to calculate the amount of time it must have passed since the last message stored on the topic, so it isn't any more or less efficient to use dates further back.