Aiven-Open / klaw

Klaw, the latest OS tool by Aiven, helps enterprises cope with Apache Kafka(r) topics, schema registry and connectors governance by introducing roles/authorizations to users of various teams of an org.
https://www.klaw-project.io/
Apache License 2.0
142 stars 59 forks source link

NullPointerException when trying to read topic events from coral UI #2602

Closed yvessavoy closed 3 weeks ago

yvessavoy commented 3 weeks ago

What happened?

When trying to retrieve topic events (Fetching mode custom, partition 0, number of messages 1) I get the following exception in the klaw-core log:

2024-09-04T07:04:15.846Z ERROR 7 --- [nio-9097-exec-6] i.a.klaw.service.TopicControllerService  : Ignoring error while retrieving topic events
java.lang.NullPointerException: Cannot invoke "io.aiven.klaw.dao.Env.getClusterId()" because the return value of "io.aiven.klaw.service.TopicControllerService.getEnvDetails(String)" is null
    at io.aiven.klaw.service.TopicControllerService.getTopicEvents(TopicControllerService.java:1320) ~[!/:2.9.0]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[na:na]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Unknown Source) ~[na:na]
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:354) ~[spring-aop-6.1.8.jar!/:6.1.8]

This happens in the core itself, the cluster-api doesn't receive any calls as it doesn't get until there. I set the consumergroup in the cluster-api-config. Also, there is a cluster, a tenant and an environment defined, so I don't really understand where the error is coming from.

What did you expect to happen?

A list of topic events is shown in the coral UI.

What else do we need to know?

aindriu-aiven commented 3 weeks ago

@yvessavoy Thank you for raising this issue I'll start investigating the issue and let you know if we have a workaround or if a bug fix is required.

aindriu-aiven commented 3 weeks ago

@yvessavoy just want to check are you using AD or are you using DB users in Klaw?

Edit: Actually I've been able to replicate it so never mind!

programmiri commented 3 weeks ago

Hi @yvessavoy, this issue should be fixed with this PR: https://github.com/Aiven-Open/klaw/pull/2603 It was just merged to main, in case you want to try it. We're also planning on doing a release in the next week, where this fix would be in.

I'm closing the issue - please reach out in case you still have this issue!

yvessavoy commented 3 weeks ago

@programmiri Thank you very much for the quick response time :) Any chance to push this to the nightly release on docker hub so I can pull it? Unfortunately I cannot build the project locally right now. If not I'll try it next week.

aindriu-aiven commented 3 weeks ago

@programmiri Thank you very much for the quick response time :) Any chance to push this to the nightly release on docker hub so I can pull it? Unfortunately I cannot build the project locally right now. If not I'll try it next week.

I'm travelling at the moment, but we should be able to update the nightly tomorrow at the latest. Unfortunately we havent managed to get that automated yet!

muralibasani commented 3 weeks ago

@yvessavoy nightly docker images are updated from main.

yvessavoy commented 3 weeks ago

Thank you! Tried it, now I have a coral template-error:

2024-09-04T14:07:20.466Z ERROR 7 --- [nio-9097-exec-9] org.thymeleaf.TemplateEngine             : [THYMELEAF][http-nio-9097-exec-9] Exception processing template "coral/index": Error resolving template [coral/index], template might not exist or might not be accessible by any of the configured Template Resolvers
org.thymeleaf.exceptions.TemplateInputException: Error resolving template [coral/index], template might not exist or might not be accessible by any of the configured Template Resolvers

Any idea if it is related to the mentioned PR?

aindriu-aiven commented 3 weeks ago

Thank you! Tried it, now I have a coral template-error:

2024-09-04T14:07:20.466Z ERROR 7 --- [nio-9097-exec-9] org.thymeleaf.TemplateEngine             : [THYMELEAF][http-nio-9097-exec-9] Exception processing template "coral/index": Error resolving template [coral/index], template might not exist or might not be accessible by any of the configured Template Resolvers
org.thymeleaf.exceptions.TemplateInputException: Error resolving template [coral/index], template might not exist or might not be accessible by any of the configured Template Resolvers

Any idea if it is related to the mentioned PR?

@muralibasani repushed the nightly, it is actually just a change that was required to build coral recently for security updates. Pretty sure that the nightly should now work correctly for you! Let us know!

yvessavoy commented 3 weeks ago

Hi guys, I wanted to keep testing this morning but am now running into the same error again as in the original description. Any chance that the changes on nightly were overwritten overnight? The log also still shows version 2.9.0, I thought maybe it should be 2.9.1?

java.lang.NullPointerException: Cannot invoke "io.aiven.klaw.dao.Env.getClusterId()" because the return value of "io.aiven.klaw.service.TopicControllerService.getEnvDetails(String)" is null
    at io.aiven.klaw.service.TopicControllerService.getTopicEvents(TopicControllerService.java:1320) ~[!/:2.9.0]
aindriu-aiven commented 3 weeks ago

Hi guys, I wanted to keep testing this morning but am now running into the same error again as in the original description. Any chance that the changes on nightly were overwritten overnight? The log also still shows version 2.9.0, I thought maybe it should be 2.9.1?

java.lang.NullPointerException: Cannot invoke "io.aiven.klaw.dao.Env.getClusterId()" because the return value of "io.aiven.klaw.service.TopicControllerService.getEnvDetails(String)" is null
  at io.aiven.klaw.service.TopicControllerService.getTopicEvents(TopicControllerService.java:1320) ~[!/:2.9.0]

Hey,

I pulled down the nightly and the defect I fixed is included in that image, so its possible there is a second issue but before I dive into that would it be possible to check a couple of things in docker? Would it be possible to run docker container ls -a to make sure the nightly image is running?

image

And also would it be possible to check when the nightly image was pushed to docker you can check that by doing a docker images

image

With regards to version we only upgrade the version when we do a release, which we are hoping to do a 2.10.0 release next week. We haven't officially released this but as long as we don't find any major defects during regression this will more or less be the same as we release next week but with the version updated to 2.10.0

aindriu-aiven commented 3 weeks ago

Hi guys, I wanted to keep testing this morning but am now running into the same error again as in the original description. Any chance that the changes on nightly were overwritten overnight? The log also still shows version 2.9.0, I thought maybe it should be 2.9.1?

java.lang.NullPointerException: Cannot invoke "io.aiven.klaw.dao.Env.getClusterId()" because the return value of "io.aiven.klaw.service.TopicControllerService.getEnvDetails(String)" is null
    at io.aiven.klaw.service.TopicControllerService.getTopicEvents(TopicControllerService.java:1320) ~[!/:2.9.0]

Hey,

I pulled down the nightly and the defect I fixed is included in that image, so its possible there is a second issue but before I dive into that would it be possible to check a couple of things in docker? Would it be possible to run docker container ls -a to make sure the nightly image is running? image

And also would it be possible to check when the nightly image was pushed to docker you can check that by doing a docker images

image

With regards to version we only upgrade the version when we do a release, which we are hoping to do a 2.10.0 release next week. We haven't officially released this but as long as we don't find any major defects during regression this will more or less be the same as we release next week but with the version updated to 2.10.0

@yvessavoy (sorry not sure if you have been notified so just @ ing you here)

yvessavoy commented 3 weeks ago

False Alarm, sorry about that. Checked with our infra team and the internal docker repo (which mirrored docker hub) had a cache problem which resulted in the nightly build being the one before yesterday. I'll keep testing for now, thank you very much for the support :)

aindriu-aiven commented 3 weeks ago

False Alarm, sorry about that. Checked with our infra team and the internal docker repo (which mirrored docker hub) had a cache problem which resulted in the nightly build being the one before yesterday. I'll keep testing for now, thank you very much for the support :)

Excellent news! thanks for letting us know