Closed raydoom closed 1 year ago
Could you describe how did you install it?
looks like it fails at getting information about available cpu on the container level
According to the installation steps https://github.com/automatiko-io/automatiko-approval-task#installation use the follow file cloned from github
-rw-r--r-- 1 root root 233 2月 13 14:47 approvaltask-dashboard-cr.yaml
-rw-r--r-- 1 root root 337 2月 13 14:47 approvaltasks-dashboard-crb.yaml
-rw-r--r-- 1 root root 209 2月 13 14:47 approvaltasks-dashboard-ext.yaml
-rw-r--r-- 1 root root 1.5K 2月 13 14:47 approvaltasks.tekton.automatiko.io-v1.yml
-rw-r--r-- 1 root root 5.6K 2月 13 14:47 kubernetes-basic-pv.yml
-rw-r--r-- 1 root root 5.9K 2月 13 15:03 kubernetes-basic.yml
-rw-r--r-- 1 root root 5.4K 2月 13 14:47 kubernetes-email.yml
-rw-r--r-- 1 root root 5.7K 2月 13 14:47 kubernetes-oauth.yml
drwxr-xr-x 2 root root 236 2月 13 14:47 test
[root@rl8-21 v1beta1]#
can you add following environment variable to the deployment resource of the file that you deploy:
- name: QUARKUS_VERTX_EVENT_LOOPS_POOL_SIZE
value: 4
the value should be pretty much the same as number of cores available but for the sake of testing this should do as well. This should bypass the need to discover available cpus and by that allow the app start. Though I have no idea why in your environment that is not possible to be found. Let's see if setting this manually will allow the app to start.
I add the environment variable, But It make no difference
my cluster environment has 3 node, 1 master and 2 work node, per node have 4vcpu, all is vm on esxi8.0,
cpu model is Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
could you paste the deployment manifest you deploy here?
and then logs from latest deployment as well. Essentially this environment variable should not require the discovery anymore so I am surprised that this did not change a thing.
this is log and pod info
[root@rl8-21 v1beta1]# kubectl logs -f automatiko-approval-task-7ff5ff8fc7-h7r9g
Feb 14, 2023 8:36:50 AM io.quarkus.runtime.configuration.ConfigRecorder
WARN: The profile 'prod' used to build the native image is different from the runtime profile 'withemail'. This may lead to unexpected results.
Feb 14, 2023 8:36:50 AM io.quarkus.runtime.ApplicationLifecycleManager run
ERROR: Failed to start application (with profile withemail)
java.lang.NullPointerException
at java.base@17.0.5/java.util.Objects.requireNonNull(Objects.java:208)
at java.base@17.0.5/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:263)
at java.base@17.0.5/java.nio.file.Path.of(Path.java:147)
at java.base@17.0.5/java.nio.file.Paths.get(Paths.java:69)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupUtil.lambda$readStringValue$0(CgroupUtil.java:57)
at java.base@17.0.5/java.security.AccessController.executePrivileged(AccessController.java:144)
at java.base@17.0.5/java.security.AccessController.doPrivileged(AccessController.java:569)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupUtil.readStringValue(CgroupUtil.java:59)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:66)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:125)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:269)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:215)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:203)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:111)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:78)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupMetrics.getInstance(CgroupMetrics.java:164)
at java.base@17.0.5/java.lang.reflect.Method.invoke(Method.java:568)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.Metrics.systemMetrics(Metrics.java:63)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.Container.metrics(Container.java:44)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.ContainerInfo.<init>(ContainerInfo.java:34)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.Containers.activeProcessorCount(Containers.java:125)
at java.base@17.0.5/java.lang.Runtime.availableProcessors(Runtime.java:247)
at org.wildfly.common.cpu.ProcessorInfo.availableProcessors(ProcessorInfo.java:29)
at io.quarkus.runtime.ExecutorRecorder.createExecutor(ExecutorRecorder.java:154)
at io.quarkus.runtime.ExecutorRecorder.setupRunTime(ExecutorRecorder.java:38)
at io.quarkus.deployment.steps.ThreadPoolSetup$createExecutor2117483448.deploy_0(Unknown Source)
at io.quarkus.deployment.steps.ThreadPoolSetup$createExecutor2117483448.deploy(Unknown Source)
at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
at io.quarkus.runtime.Application.start(Application.java:101)
at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:109)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:44)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:124)
at io.quarkus.runner.GeneratedMain.main(Unknown Source)
[root@rl8-21 v1beta1]# kubectl describe pod automatiko-approval-task-7ff5ff8fc7-h7r9g
Name: automatiko-approval-task-7ff5ff8fc7-h7r9g
Namespace: default
Priority: 0
Service Account: automatiko-approval-task
Node: rl8-22/192.168.1.22
Start Time: Tue, 14 Feb 2023 16:09:32 +0800
Labels: app.kubernetes.io/name=automatiko-approval-task
app.kubernetes.io/version=0.6.0
pod-template-hash=7ff5ff8fc7
Annotations: app.quarkus.io/build-timestamp: 2022-06-10 - 10:47:05 +0000
app.quarkus.io/commit-id: ff3c1b22730c718173b0c7cf3a875b05f9aa350c
Status: Running
IP: 10.244.1.182
IPs:
IP: 10.244.1.182
Controlled By: ReplicaSet/automatiko-approval-task-7ff5ff8fc7
Containers:
automatiko-approval-task:
Container ID: containerd://031df4f7a85f9d2c8a949841af1a4638afb95fbf2dfec384f49c1e52ca3b217e
Image: automatiko/automatiko-approval-task:0.6.0
Image ID: docker.io/automatiko/automatiko-approval-task@sha256:9e56880aac0c494a7c5f248e953fcdc215eb83f66405f59c9eb4f7bb201459e7
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 14 Feb 2023 16:36:49 +0800
Finished: Tue, 14 Feb 2023 16:36:50 +0800
Ready: False
Restart Count: 10
Liveness: http-get http://:8080/q/health/live delay=0s timeout=10s period=30s #success=1 #failure=3
Readiness: http-get http://:8080/q/health/ready delay=0s timeout=10s period=30s #success=1 #failure=3
Environment:
KUBERNETES_NAMESPACE: default (v1:metadata.namespace)
QUARKUS_OPERATOR_SDK_NAMESPACES: default
QUARKUS_AUTOMATIKO_SERVICE_URL: http://localhost:9000
QUARKUS_MAILER_MOCK: true
QUARKUS_PROFILE: withemail
QUARKUS_MAILER_FROM: youruser@gmail.com
QUARKUS_MAILER_HOST: smtp.gmail.com
QUARKUS_MAILER_PORT: 587
QUARKUS_MAILER_USERNAME: youruser@gmail.com
QUARKUS_MAILER_PASSWORD: password
QUARKUS_AUTOMATIKO_ON_INSTANCE_END: keep
QUARKUS_VERTX_EVENT_LOOPS_POOL_SIZE: 4
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-68h76 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-68h76:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned default/automatiko-approval-task-7ff5ff8fc7-h7r9g to rl8-22
Normal Pulled 30m kubelet Successfully pulled image "automatiko/automatiko-approval-task:0.6.0" in 5.78872478s (5.788734085s including waiting)
Warning Unhealthy 30m kubelet Readiness probe failed: Get "http://10.244.1.182:8080/q/health/ready": dial tcp 10.244.1.182:8080: connect: connection refused
Normal Pulled 30m kubelet Successfully pulled image "automatiko/automatiko-approval-task:0.6.0" in 5.996368727s (5.996373615s including waiting)
Normal Pulled 30m kubelet Successfully pulled image "automatiko/automatiko-approval-task:0.6.0" in 5.921606021s (5.921610421s including waiting)
Normal Pulling 29m (x4 over 30m) kubelet Pulling image "automatiko/automatiko-approval-task:0.6.0"
Normal Created 29m (x4 over 30m) kubelet Created container automatiko-approval-task
Normal Started 29m (x4 over 30m) kubelet Started container automatiko-approval-task
Normal Pulled 29m kubelet Successfully pulled image "automatiko/automatiko-approval-task:0.6.0" in 5.99408871s (5.99409328s including waiting)
Warning BackOff 28s (x150 over 30m) kubelet Back-off restarting failed container
[root@rl8-21 v1beta1]#
It actually helped as it moved forward and now it attempts to get cpu again to setup thread executor. So the error is the same but in different place. Unfortunately there is no switch to bypass that part.
What we can try is to declare resource limits to make the QoS to be guaranteed and see if then container will get required files mounted so cpu values can be resolved.
resources:
limits:
cpu: 1
memory: 512Mi
requests:
cpu: 1
memory: 512Mi
then
[root@rl8-21 v1beta1]# kubectl logs -f automatiko-approval-task-5f88ddbcd9-xcm8w
Feb 14, 2023 10:02:52 AM io.quarkus.runtime.configuration.ConfigRecorder
WARN: The profile 'prod' used to build the native image is different from the runtime profile 'withemail'. This may lead to unexpected results.
Feb 14, 2023 10:02:52 AM io.quarkus.runtime.ApplicationLifecycleManager run
ERROR: Failed to start application (with profile withemail)
java.lang.NullPointerException
at java.base@17.0.5/java.util.Objects.requireNonNull(Objects.java:208)
at java.base@17.0.5/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:263)
at java.base@17.0.5/java.nio.file.Path.of(Path.java:147)
at java.base@17.0.5/java.nio.file.Paths.get(Paths.java:69)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupUtil.lambda$readStringValue$0(CgroupUtil.java:57)
at java.base@17.0.5/java.security.AccessController.executePrivileged(AccessController.java:144)
at java.base@17.0.5/java.security.AccessController.doPrivileged(AccessController.java:569)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupUtil.readStringValue(CgroupUtil.java:59)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:66)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:125)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:269)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:215)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:203)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:111)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:78)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.CgroupMetrics.getInstance(CgroupMetrics.java:164)
at java.base@17.0.5/java.lang.reflect.Method.invoke(Method.java:568)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.Metrics.systemMetrics(Metrics.java:63)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.containers.Container.metrics(Container.java:44)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.ContainerInfo.<init>(ContainerInfo.java:34)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.Containers.activeProcessorCount(Containers.java:125)
at java.base@17.0.5/java.lang.Runtime.availableProcessors(Runtime.java:247)
at org.wildfly.common.cpu.ProcessorInfo.availableProcessors(ProcessorInfo.java:29)
at io.quarkus.runtime.ExecutorRecorder.createExecutor(ExecutorRecorder.java:154)
at io.quarkus.runtime.ExecutorRecorder.setupRunTime(ExecutorRecorder.java:38)
at io.quarkus.deployment.steps.ThreadPoolSetup$createExecutor2117483448.deploy_0(Unknown Source)
at io.quarkus.deployment.steps.ThreadPoolSetup$createExecutor2117483448.deploy(Unknown Source)
at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
at io.quarkus.runtime.Application.start(Application.java:101)
at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:109)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:71)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:44)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:124)
at io.quarkus.runner.GeneratedMain.main(Unknown Source)
[root@rl8-21 v1beta1]# kubectl describe pod automatiko-approval-task-5f88ddbcd9-xcm8w
Name: automatiko-approval-task-5f88ddbcd9-xcm8w
Namespace: default
Priority: 0
Service Account: automatiko-approval-task
Node: rl8-22/192.168.1.22
Start Time: Tue, 14 Feb 2023 17:57:36 +0800
Labels: app.kubernetes.io/name=automatiko-approval-task
app.kubernetes.io/version=0.6.0
pod-template-hash=5f88ddbcd9
Annotations: app.quarkus.io/build-timestamp: 2022-06-10 - 10:47:05 +0000
app.quarkus.io/commit-id: ff3c1b22730c718173b0c7cf3a875b05f9aa350c
Status: Running
IP: 10.244.1.216
IPs:
IP: 10.244.1.216
Controlled By: ReplicaSet/automatiko-approval-task-5f88ddbcd9
Containers:
automatiko-approval-task:
Container ID: containerd://6e4a58dc5c7759908365f2d88a2871d45b077f537b257574d2335fad1bfca19b
Image: automatiko/automatiko-approval-task:0.6.0
Image ID: docker.io/automatiko/automatiko-approval-task@sha256:9e56880aac0c494a7c5f248e953fcdc215eb83f66405f59c9eb4f7bb201459e7
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 14 Feb 2023 18:02:52 +0800
Finished: Tue, 14 Feb 2023 18:02:52 +0800
Ready: False
Restart Count: 1
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 1
memory: 512Mi
Liveness: http-get http://:8080/q/health/live delay=0s timeout=10s period=30s #success=1 #failure=3
Readiness: http-get http://:8080/q/health/ready delay=0s timeout=10s period=30s #success=1 #failure=3
Environment:
KUBERNETES_NAMESPACE: default (v1:metadata.namespace)
QUARKUS_OPERATOR_SDK_NAMESPACES: default
QUARKUS_AUTOMATIKO_SERVICE_URL: http://localhost:9000
QUARKUS_MAILER_MOCK: true
QUARKUS_PROFILE: withemail
QUARKUS_MAILER_FROM: youruser@gmail.com
QUARKUS_MAILER_HOST: smtp.gmail.com
QUARKUS_MAILER_PORT: 587
QUARKUS_MAILER_USERNAME: youruser@gmail.com
QUARKUS_MAILER_PASSWORD: password
QUARKUS_AUTOMATIKO_ON_INSTANCE_END: keep
QUARKUS_VERTX_EVENT_LOOPS_POOL_SIZE: 4
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bfzdg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-bfzdg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m47s default-scheduler Successfully assigned default/automatiko-approval-task-5f88ddbcd9-xcm8w to rl8-22
Normal Pulled 6m42s kubelet Successfully pulled image "automatiko/automatiko-approval-task:0.6.0" in 5.747145675s (5.747150514s including waiting)
Normal Created 6m42s kubelet Created container automatiko-approval-task
Normal Started 6m42s kubelet Started container automatiko-approval-task
Warning Failed 5m50s (x3 over 6m41s) kubelet Failed to pull image "automatiko/automatiko-approval-task:0.6.0": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/automatiko/automatiko-approval-task:0.6.0": failed to resolve reference "docker.io/automatiko/automatiko-approval-task:0.6.0": failed to do request: Head "https://registry-1.docker.io/v2/automatiko/automatiko-approval-task/manifests/0.6.0": EOF
Warning Failed 5m50s (x3 over 6m41s) kubelet Error: ErrImagePull
Normal BackOff 5m22s (x2 over 6m18s) kubelet Back-off pulling image "automatiko/automatiko-approval-task:0.6.0"
Warning Failed 5m22s (x2 over 6m18s) kubelet Error: ImagePullBackOff
Warning BackOff 4m29s (x8 over 6m40s) kubelet Back-off restarting failed container
Normal Pulling 98s (x6 over 6m48s) kubelet Pulling image "automatiko/automatiko-approval-task:0.6.0"
[root@rl8-21 v1beta1]#
could you please try with this image instead: mswiderski/automatiko-approval-task:0.6.1
It comes with additional build switch that is suppose to workaround that problem. If that works fine I will prepare new official release with it.
It work, Think you for your job
great to hear, I just made the official release and you can now use automatiko/automatiko-approval-task:0.6.1
thanks
After install ,I got the follow error
my env