apache / incubator-kie-kogito-operator

Kubernetes Operator for Kogito
Apache License 2.0
45 stars 79 forks source link

data-index CrashLoopBackOff on kubernetes #607

Closed Gupta-Amrit closed 3 years ago

Gupta-Amrit commented 3 years ago

I am trying to deploy travel-agency example on Kubernetes but data-index pod is going to CrashLoopBackOff and throwing below error :

2020-10-09 12:10:58,277 WARN [io.qua.config] (main) Unrecognized configuration key "quarkus.kafka.bootstrap-servers" was provided; it will be ignored; verify that the dependency extension for this configuration is set or you did not make a typo 2020-10-09 12:10:59,407 INFO [org.inf.HOTROD] (main) ISPN004021: Infinispan version: Infinispan 'Corona Extra' 11.0.3.Final 2020-10-09 12:10:59,439 ERROR [org.inf.HOTROD] (HotRod-client-async-pool-1-1) ISPN004007: Exception encountered. Retry 10 out of 10: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: localhost/127.0.0.1:11222 Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

Thank you for your work.

Best regards,

radtriste commented 3 years ago

Hello @Gupta-Amrit Which version of operator do you use ? Which command/yaml did you use to deploy the data-index ? Could you provide logs from the operator ?

sutaakar commented 3 years ago

@Gupta-Amrit Can you please also share the steps you did to deploy the travel-agency example?

Gupta-Amrit commented 3 years ago

Below are the images that are getting used for different operators :

Operator        Image
infinispan      jboss/infinispan-operator:1.1.1.Final
kogito          quay.io/kiegroup/kogito-cloud-operator:0.16.0
strimzi         strimzi/operator:0.17.0
data-index          quay.io/kiegroup/kogito-data-index:0.16

Steps followed

Install kogito operator

$ cd ~/kogito-cloud-operator-0.16.0 $ export NAMESPACE=kogito $ kubectl create ns $NAMESPACE $ ./hack/install.sh

To install other required operators

$ cd ~/kogito-cloud-operator-0.16.0 $ ./examples/kubernetes/travel-agency/deploy.sh

NOTE : deploy.sh script is not working properly due to incorrect path set on the EXAMPLES_DIR. I have raised a PR #608 to fix it. Also kogito-travels and kogito-visas images are not public and requires authorization to use it but that is fine as the source code is available so I can build the image from it but to just to inform those images are not public anymore.

BTW, I am following this documentation(https://docs.jboss.org/kogito/release/latest/html_single/#proc-kogito-deploying-on-kubernetes_kogito-deploying-on-openshift)

Kogito operator logs

{"level":"info","T":"2020-10-09T07:54:56.036Z","logger":"kogitodataindex_controller","msg":"Injecting Data Index URL into KogitoRuntime services in the namespace 'kogito'","Request.Namespace":"kogito","Request.Name":"data-index"} {"level":"info","T":"2020-10-09T07:54:56.099Z","logger":"services_definition","msg":"Updating status for Kogito Service data-index"} {"level":"info","T":"2020-10-09T07:54:56.117Z","logger":"services_definition","msg":"Successfully reconciled Kogito Service data-index"} {"level":"info","T":"2020-10-09T07:54:56.117Z","logger":"kogitodataindex_controller","msg":"Reconciling KogitoDataIndex","Request.Namespace":"kogito","Request.Name":"data-index"} {"level":"info","T":"2020-10-09T07:54:56.117Z","logger":"kogitodataindex_controller","msg":"Injecting Data Index URL into KogitoRuntime services in the namespace 'kogito'","Request.Namespace":"kogito","Request.Name":"data-index"} {"level":"info","T":"2020-10-09T07:54:56.175Z","logger":"services_definition","msg":"Updating status for Kogito Service data-index"} {"level":"info","T":"2020-10-09T07:54:56.187Z","logger":"services_definition","msg":"Successfully reconciled Kogito Service data-index"}

infinispan operator logs

{"level":"info","ts":1602254293.1302795,"logger":"controller_infinispan","msg":"Reconciling Infinispan","Request.Namespace":"kogito","Request.Name":"kogito-infinispan"} {"level":"info","ts":1602254293.130337,"logger":"controller_infinispan","msg":"Configuring the StatefulSet","Request.Namespace":"kogito","Request.Name":"kogito-infinispan"} {"level":"error","ts":1602254293.138723,"logger":"controller_infinispan","msg":"failed to update Infinispan Spec","Request.Namespace":"kogito","Request.Name":"kogito-infinispan","error":"Infinispan.infinispan.org \"kogito-infinispan\" is invalid: [spec.expose.type: Required value,spec.logging.categories: Invalid value: \"null\": spec.logging.categories in body must be of type object: \"null\", spec.service.sites.locations: Invalid value: \"null\": spec.service.sites.locations in body must be of type array: \"null\", spec.service.sites.local.expose.type: Required value]","stacktrace":"github.com/go-logr/zapr.(zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngithub.com/infinispan/infinispan-operator/pkg/controller/infinispan.updateSecurity\n\t/infinispan-operator/pkg/controller/infinispan/infinispan_controller.go:625\ngithub.com/infinispan/infinispan-operator/pkg/controller/infinispan.(ReconcileInfinispan).Reconcile\n\t/infinispan-operator/pkg/controller/infinispan/infinispan_controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.8/pkg/internal/controller/controller.go:213\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.8/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181126123746-eddba98df674/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181126123746-eddba98df674/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181126123746-eddba98df674/pkg/util/wait/wait.go:88"} {"level":"error","ts":1602254293.1388295,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"infinispan-controller","request":"kogito/kogito-infinispan","error":"Infinispan.infinispan.org \"kogito-infinispan\" is invalid: [spec.expose.type: Required value, spec.logging.categories: Invalid value: \"null\": spec.logging.categories in body must be of type object: \"null\", spec.service.sites.locations: Invalid value: \"null\": spec.service.sites.locations in body must be of type array: \"null\", spec.service.sites.local.expose.type: Required value]","stacktrace":"github.com/go-logr/zapr.(zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.8/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.1.8/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181126123746-eddba98df674/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181126123746-eddba98df674/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20181126123746-eddba98df674/pkg/util/wait/wait.go:88"}

sutaakar commented 3 years ago

@Gupta-Amrit Thanks for sharing all the info. The travel agency example was unfortunately not updated to reflect infrastructure changes done for 0.16. I have reported https://issues.redhat.com/browse/KOGITO-3579 to adjust the example.

The current scripts should be compatible with 0.15 version of the operator. In case you would like to try it without waiting for fix then please install 0.15 operator (by using 0.15.x branch of this repository or by using OLM to install the operator).

Gupta-Amrit commented 3 years ago

hey @sutaakar I tried using 0.15.0 operator but getting below error.

{"level":"error","ts":1602338878.2972353,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"kogitodataindex-controller","request":"kogito/data-index","error":"resource name may not be empty","stacktrace":"github.com/go-logr/zapr.(zapLogger).Error\n\t/home/jenkins/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/home/jenkins/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/home/jenkins/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).worker\n\t/home/jenkins/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/home/jenkins/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/home/jenkins/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/jenkins/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/jenkins/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90"}

Could you please help me... am i missing any step or should I also try with OLM ?

sutaakar commented 3 years ago

@Gupta-Amrit When I applied changes from https://github.com/kiegroup/kogito-cloud-operator/pull/609 (and fixed Data index image version) then Data index was successfully deployed using Kogito operator 0.16. Please try it too.

Gupta-Amrit commented 3 years ago

@sutaakar I am still getting the same error. Could you please share the version and steps you followed to deploy ?

sutaakar commented 3 years ago

@Gupta-Amrit Here are the steps I used, running against KOPS 1.17:

Gupta-Amrit commented 3 years ago

@sutaakar Thank you for sharing the steps. I have followed the steps are data-index is running fine. Also, I found out one more issue. Not sure whether you have encountered the same but I am using AKS(Kubernetes 1.17.1 version) cluster and kogito-infinispan statefulset pod was throwing below error

13:17:00,346 FATAL (main) [org.infinispan.SERVER] ISPN080028: Infinispan Server failed to start java.util.concurrent.ExecutionException: org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock'/opt/infinispan/server/data/global.lck' for persistent global state at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at org.infinispan.server.Bootstrap.runInternal(Bootstrap.java:140) at org.infinispan.server.tool.Main.run(Main.java:98) at org.infinispan.server.Bootstrap.main(Bootstrap.java:40) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.infinispan.server.loader.Loader.run(Loader.java:76) at org.infinispan.server.loader.Loader.main(Loader.java:39) Caused by: org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock '/opt/infinispan/server/data/global.lck' for persistent global state at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:751) at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:717) at org.infinispan.server.SecurityActions.lambda$startCacheManager$1(SecurityActions.java:64) at org.infinispan.security.Security.doPrivileged(Security.java:46) at org.infinispan.server.SecurityActions.doPrivileged(SecurityActions.java:36) at org.infinispan.server.SecurityActions.startCacheManager(SecurityActions.java:67) at org.infinispan.server.Server.run(Server.java:332) ... 9 more Caused by: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock '/opt/infinispan/server/data/global.lck' for persistent global state at org.infinispan.globalstate.impl.GlobalStateManagerImpl.acquireGlobalLock(GlobalStateManagerImpl.java:87) at org.infinispan.globalstate.impl.GlobalStateManagerImpl.start(GlobalStateManagerImpl.java:64) at org.infinispan.globalstate.impl.CorePackageImpl$1.start(CorePackageImpl.java:34) at org.infinispan.globalstate.impl.CorePackageImpl$1.start(CorePackageImpl.java:27) at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:592) at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:583) at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:552) at org.infinispan.factories.impl.BasicComponentRegistryImpl.access$700(BasicComponentRegistryImpl.java:30) at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:775) at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:341) at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:237) at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:746) ... 15 more Caused by: java.io.FileNotFoundException: /opt/infinispan/server/data/global.lck (Permission denied) at java.base/java.io.FileOutputStream.open0(Native Method) at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298) at java.base/java.io.FileOutputStream.(FileOutputStream.java:237) at java.base/java.io.FileOutputStream.(FileOutputStream.java:187) at org.infinispan.globalstate.impl.GlobalStateManagerImpl.acquireGlobalLock(GlobalStateManagerImpl.java:81) ... 26 more

13:17:00,351 INFO (Thread-0) [org.infinispan.SERVER] ISPN080002: Infinispan Server stopping 13:17:00,356 INFO (Thread-0) [org.infinispan.SERVER] ISPN080003: Infinispan Server stopped

I crossed checked the persistent volume and persistent volume claim, both of them were created and attached to the pod. On further debugging I found out that the it was running with user id 185 and this user was not having the access to write on the volume. So I changed it to the root user and it worked. As root user is not recommended for production so I looking for someother workaround or solution. Did you also faced the same issue ??

ricardozanini commented 3 years ago

Hi @Gupta-Amrit

This seems a problem related to the Infinispan Operator itself. I haven't seen this error before, but this one might be related? https://github.com/infinispan/infinispan-operator/issues/392

Try adding:

  - name: MAKE_DATADIR_WRITABLE
       value: "true"

in the env attribute of the Infinispan operator's yaml file.

ricardozanini commented 3 years ago

Hi @Gupta-Amrit have you managed to have this working?

Gupta-Amrit commented 3 years ago

Hi @ricardozanini I did not get the chance to try your changes for the Infinispan operator but as of now I am using runAsUser(root) attribute in the security-context to fix it. And data-index is working now. As now all the required changes are already merged the only thing which would be required to fix data-index CrashLoopBackOff is to wait until the Kafka is fully initialized and then delete data-index pod for respin. Btw, I also faced some other issues with Kogito management console and Infinispan on kubernetes .. I am still working on it.. once done I will try to create a PR. Feel free to close this ticket.

Thank you for the support

rbaumgar commented 3 years ago

Looks like the problem is that no TLS is used. See https://github.com/kiegroup/kogito-examples/issues/454

ricardozanini commented 3 years ago

Oh the TLS issue has been fixed in #646

For Infinispan 10.x that would work without TLS.