Closed igabba closed 1 year ago
which helm install are you using?
Hi, I followed this instructions https://github.com/featurehub-io/featurehub-helm/tree/main
Heya - those instructions use Kind which is what we use for testing. When you have the cluster up, and do a
kubectl get all -n <featurehub-namespace>
what do you get and when you restart the cluster, doing the same, what do you get?
I am not familiar with Minikube I'm sorry, but this may help us diagnose what has happened with the services.
I get the same.
This one is before restart.
This one is post cluster restart.
After that I made a rollout of dacha, dacha restarted but, again, when I try to get features I only get
Tell me if some logs helps you. Thanks
My theory is that the request from Dacha -> MR is timing out, so on that restart if you can drop in the Dacha logs here? There is a 12 second timeout (its in the helm chart) that can be changed to a higher value if thats the case.
I can see this error in dacha pod:
{"@timestamp":"2023-09-15T21:29:52.468+0000","message":"Failed jersey request","priority":"ERROR","path":"io.featurehub.jersey.config.LocalExceptionMapper","thread":"grizzly-http-server-1","stack_trace":"jakarta.ws.rs.NotFoundException: HTTP 404 Not Found\n\tio.featurehub.dacha2.resource.DachaApiKeyResource.getApiKeyDetails(DachaApiKeyResource.kt:16) ~[dacha2-1.1-SNAPSHOT.jar:?]\n\tjdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]\n\tjdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]\n\tjdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]\n\tjava.lang.reflect.Method.invoke(Unknown Source) ~[?:?]\n\torg.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:134) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:177) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:81) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) ~[jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:261) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors.process(Errors.java:292) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors.process(Errors.java:274) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.internal.Errors.process(Errors.java:244) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [jersey-common-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:240) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697) [jersey-server-3.1.1.jar:?]\n\torg.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:367) [jersey-container-grizzly2-http-3.1.1.jar:?]\n\torg.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:190) [grizzly-http-server-4.0.0.jar:4.0.0]\n\torg.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:535) [grizzly-framework-4.0.0.jar:4.0.0]\n\torg.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:515) [grizzly-framework-4.0.0.jar:4.0.0]\n\tjava.lang.Thread.run(Unknown Source) [?:?]","host":"featurehub-dacha-848ffd48f-9l2ht"}
Hope it helps
We're going to need to turn on trace level logging to figure this out... in the values.yaml file in the helm chart, there is a section around lline 68 that has an XML comment, if you remove it, i.e.:
<!--
... logging detail
-->
Get rid of the <!--
and -->
and allso add in the line
<AsyncLogger name="io.featurehub.dacha2" level="trace"/>
Just make sure it is properly indented, I should really change that to be a file import. And then do the same thing, you should see REST traffic incoming from Edge and then outgoing to MR and then an appropriate response. When you hit it when its fresh it will do this. You don't need to undeploy, it has a 30 second timer to rescan the log configuration.
Try just bouncing the dacha instance rathert han restarting the whole clluster first and see if it recurs, if not, follow the bounce cluster effect.
When it looks for an environment id + service account that it doesn't know about, it will ask MR and then cache it from then on. The implication is that MR is saying it doesn't exist but that seems incongruous.
Thanks for persevering in this!
Well, I've modified this lines and I get some additional info. I'll attach logs (only the relevant parts) from dacha and edge.
Well I realize that if I add the port 8701 in management-repository/service.xml everything works.
Is there something I'm missing in the installation steps?
Its part of the definition of the Management Repository deployment.yaml file:
I expect then thats because prometheus is not enabled, ergo it doesn't make that port available on the service which is causing the problem? Seems weird it worked in the first place for you then? I better correct that.
Hello. Indeed I saw that configuration in the service.yaml but if I enabled Prometheus it gave me an error because I don't have it installed. Therefore I added another port by hand and it worked. If I didn't misunderstand the architecture, it could be that at first it worked for me because dacha receives notifications from nat directly, right? However, once I restart the cluster it will try to go to the management-repository to populate the cache. But I didn't have the port enabled there, could it be?
Yeah, that makes sense - thats likely what it is!
I've released 4.0.4 now and that should fix it. Just waiting for ArtifactHub to pick it up.
OK, 1.6.3. has been released and the chart is on this version so we should be all good :-)
Describe the bug I am running everything in Minikube and installed it using Helm. I've noticed that every time I restart the cluster, either manually or due to a laptop restart, the service doesn't return any results when I attempt to retrieve the features.
I tried several troubleshooting steps, including restarting the Datcha pods, but with no success. However, I found a solution by updating some permissions on the service account permission tab. After doing so, the service started working again. I suspect that this action may have caused the cache to be populated anew or something similar.
Interestingly, I don't encounter this issue when using Docker Compose (all-separate-postgress). I've been able to restart it multiple times without encountering any problems.
Which area does this issue belong to?
To Reproduce Steps to reproduce the behavior:
Expected behavior The features should be there after a cluster restart
Versions