Closed Aymen-Ben-S closed 3 months ago
@Aymen-Ben-S, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
Just to add some additional context about the problem above... it is the same mentioned here: https://github.com/SeleniumHQ/docker-selenium/issues/2129#issuecomment-1970446915
At that time @VietND96 though preStop was stuck and suggested to get some logs from there. Are these logs the same we have in chrome node (obtained via kubectl logs) or is there any other log persisted inside the container and not exposed? We are using 4.18 images as mentioned above, chrome node has no errors in the log, but the pod does not terminate even after their processes receive the SIGTERM (they keep as Running and/or Terminating - if we restart hub/keda).
We do not see this behavior consistently, but once the first node hangs, all the others will hang. It seems cluster/grid enters in a state where selenium processes cannot be killed and nodes live forever.
Your K8s env is using runtime contained, right? I am asking this since when looking into message failed to open /proc/187153/ns/ipc: No such file or directory
and found something could be similar mentioned in https://github.com/containerd/containerd/issues/9160
@VietND96 our runtime is really running in containers in the same cluster, which are connected with hub via remote web driver (java client). We are not using containerd
though. I did not notice the failed message about /proc/187153/ns/ipc: No such file or directory
, it is something @Aymen-Ben-S was investigating yesterday. So if it happens randomly it seems this file/directory is found sometimes and sometimes they are not... I am not sure what /proc/187153/ns/ipc
directory really means, or how it is used or required for our nodes to stop. I will investigate more in that line.
Thanks for the link to the other issue... overall the way we reach to the problem is different, though the absence of that file/dir is quite the same.. will keep an eye on that as well.
Can you also try to set global.seleniumGrid.stdoutProbeLog=false
I idea behind this config is to let stdout to the pid 1 in the container that you can see /opt/selenium/nodeProbe.sh preStop >> /proc/1/fd/1
Seems /proc/1/fd/1
is a trick and it won't work in all platforms, e.g in this case is OpenShift.
Just to add two more examples... I have in my live env here one chrome node in running state and one chrome node in terminating state... both are stuck (usually the "running" stuck pods turn to "terminating" when we restart hub/keda pods... I do not see events similar to the one posted above by @Aymen-Ben-S - but I see preStop failed anyway. Follow pod describe:
Stuck Running Node
oc describe pod selenium-grid-selenium-chrome-node-vcr64-kvjtq
Name: selenium-grid-selenium-chrome-node-vcr64-kvjtq
Namespace: selenium
Priority: 0
Node: 10.177.41.210/10.177.41.210
Start Time: Wed, 13 Mar 2024 17:25:14 -0300
Labels: app=selenium-grid-selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.18.1-20240224
app.kubernetes.io/instance=selenium-grid
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-grid-selenium-chrome-node
app.kubernetes.io/version=4.18.1-20240224
controller-uid=6c5573d2-3e1e-4887-9dc4-5b71e2a6bd19
helm.sh/chart=selenium-grid-0.28.4
job-name=selenium-grid-selenium-chrome-node-vcr64
scaledjob.keda.sh/name=selenium-grid-selenium-chrome-node
Annotations: checksum/event-bus-configmap: 2698802d0bbf358d1634b47dff1ef36c5fc2501a27a9d2eef02c7874eb9496f8
checksum/logging-configmap: 7f721b250f90c8a5877dc9217b97f0a14392b420edf0e5af105a60944d2b9dc3
checksum/node-configmap: 3b6c0fffa6e6a10d57e5455ce21e1e7ee55e0638f15ff521b8c96fe8c10d8e91
checksum/server-configmap: ac6520a86bfffa04b4946bbce02ac8f1be341d800f4ab09f4e7cf274f74d3770
cni.projectcalico.org/containerID: a3c4689d38376d4277486296c7c87aa32f59794f9f401bbaffe0cf60b1cf3ea1
cni.projectcalico.org/podIP: 172.30.201.59/32
cni.projectcalico.org/podIPs: 172.30.201.59/32
k8s.v1.cni.cncf.io/network-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.201.59"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 172.30.201.59
IPs:
IP: 172.30.201.59
Controlled By: Job/selenium-grid-selenium-chrome-node-vcr64
Containers:
selenium-grid-selenium-chrome-node:
Container ID: cri-o://e06640a839fa0ec541ca954e5639f23d68802128187557a5ee83e8634aa796b9
Image: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224
Image ID: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome@sha256:3b50643ff9885215c9142fefecace7b17efdde64a8c863ef702bd4a6c3e6a378
Port: 5555/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 13 Mar 2024 17:25:16 -0300
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 1Gi
Startup: exec [bash -c /opt/selenium/nodeProbe.sh Startup >> /proc/1/fd/1] delay=0s timeout=60s period=5s #success=1 #failure=12
Environment Variables from:
selenium-grid-selenium-event-bus ConfigMap Optional: false
selenium-grid-selenium-node-config ConfigMap Optional: false
selenium-grid-selenium-logging-config ConfigMap Optional: false
selenium-grid-selenium-server-config ConfigMap Optional: false
selenium-grid-selenium-secrets Secret Optional: false
Environment:
SE_OTEL_SERVICE_NAME: selenium-grid-selenium-chrome-node
SE_NODE_PORT: 5555
SE_NODE_REGISTER_PERIOD: 60
SE_NODE_REGISTER_CYCLE: 5
Mounts:
/dev/shm from dshm (rw)
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
/opt/selenium/nodeProbe.sh from selenium-grid-selenium-node-config (rw,path="nodeProbe.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q48g8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
selenium-grid-selenium-node-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: selenium-grid-selenium-node-config
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-q48g8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
What is interesting with this running pod is that it has no Event associated, but if we look at their logs, selenium processes were supposed to have terminated as they got SIGTERM:
2024-03-13T20:25:25UTC [Probe.Startup] - Node responds the ID: e3578a60-9ae9-4c10-8266-06d4a2dd4859 with status: DRAINING
2024-03-13T20:25:25UTC [Probe.Startup] - Grid responds a matched Node ID: e3578a60-9ae9-4c10-8266-06d4a2dd4859
2024-03-13T20:25:25UTC [Probe.Startup] - Node ID: e3578a60-9ae9-4c10-8266-06d4a2dd4859 is found in the Grid. Node is ready.
20:29:45.411 INFO [SessionSlot.stop] - Stopping session 600b097a806e938d7f3d27b44b3f96ca
20:29:45.413 INFO [LocalNode.stopTimedOutSession] - Node draining complete!
20:29:46.416 INFO [NodeServer.lambda$createHandlers$3] - Shutting down
2024-03-13 20:29:46,780 INFO exited: selenium-node (exit status 0; expected)
2024-03-13 20:29:46,781 WARN received SIGINT indicating exit request
2024-03-13 20:29:46,781 INFO waiting for xvfb, vnc, novnc to die
2024-03-13 20:29:48,783 INFO stopped: novnc (terminated by SIGTERM)
2024-03-13 20:29:49,814 INFO stopped: vnc (terminated by SIGTERM)
2024-03-13 20:29:49,814 INFO waiting for xvfb to die
2024-03-13 20:29:50,816 INFO stopped: xvfb (terminated by SIGTERM)
Stuck Terminating Node
oc describe pod selenium-grid-selenium-chrome-node-vcr64-kvjtq
Name: selenium-grid-selenium-chrome-node-vcr64-kvjtq
Namespace: selenium
Priority: 0
Node: 10.177.41.210/10.177.41.210
Start Time: Wed, 13 Mar 2024 17:25:14 -0300
Labels: app=selenium-grid-selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.18.1-20240224
app.kubernetes.io/instance=selenium-grid
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-grid-selenium-chrome-node
app.kubernetes.io/version=4.18.1-20240224
controller-uid=6c5573d2-3e1e-4887-9dc4-5b71e2a6bd19
helm.sh/chart=selenium-grid-0.28.4
job-name=selenium-grid-selenium-chrome-node-vcr64
scaledjob.keda.sh/name=selenium-grid-selenium-chrome-node
Annotations: checksum/event-bus-configmap: 2698802d0bbf358d1634b47dff1ef36c5fc2501a27a9d2eef02c7874eb9496f8
checksum/logging-configmap: 7f721b250f90c8a5877dc9217b97f0a14392b420edf0e5af105a60944d2b9dc3
checksum/node-configmap: 3b6c0fffa6e6a10d57e5455ce21e1e7ee55e0638f15ff521b8c96fe8c10d8e91
checksum/server-configmap: ac6520a86bfffa04b4946bbce02ac8f1be341d800f4ab09f4e7cf274f74d3770
cni.projectcalico.org/containerID: a3c4689d38376d4277486296c7c87aa32f59794f9f401bbaffe0cf60b1cf3ea1
cni.projectcalico.org/podIP: 172.30.201.59/32
cni.projectcalico.org/podIPs: 172.30.201.59/32
k8s.v1.cni.cncf.io/network-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.201.59"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 172.30.201.59
IPs:
IP: 172.30.201.59
Controlled By: Job/selenium-grid-selenium-chrome-node-vcr64
Containers:
selenium-grid-selenium-chrome-node:
Container ID: cri-o://e06640a839fa0ec541ca954e5639f23d68802128187557a5ee83e8634aa796b9
Image: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224
Image ID: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome@sha256:3b50643ff9885215c9142fefecace7b17efdde64a8c863ef702bd4a6c3e6a378
Port: 5555/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 13 Mar 2024 17:25:16 -0300
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 1Gi
Startup: exec [bash -c /opt/selenium/nodeProbe.sh Startup >> /proc/1/fd/1] delay=0s timeout=60s period=5s #success=1 #failure=12
Environment Variables from:
selenium-grid-selenium-event-bus ConfigMap Optional: false
selenium-grid-selenium-node-config ConfigMap Optional: false
selenium-grid-selenium-logging-config ConfigMap Optional: false
selenium-grid-selenium-server-config ConfigMap Optional: false
selenium-grid-selenium-secrets Secret Optional: false
Environment:
SE_OTEL_SERVICE_NAME: selenium-grid-selenium-chrome-node
SE_NODE_PORT: 5555
SE_NODE_REGISTER_PERIOD: 60
SE_NODE_REGISTER_CYCLE: 5
Mounts:
/dev/shm from dshm (rw)
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
/opt/selenium/nodeProbe.sh from selenium-grid-selenium-node-config (rw,path="nodeProbe.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q48g8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
selenium-grid-selenium-node-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: selenium-grid-selenium-node-config
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-q48g8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
(pytestenv) alequint@Alexandres-MBP fvt-ibm-mas-manage % oc describe pod selenium-grid-selenium-chrome-node-wqv2b-pbk8g
Name: selenium-grid-selenium-chrome-node-wqv2b-pbk8g
Namespace: selenium
Priority: 0
Node: 10.177.41.210/10.177.41.210
Start Time: Wed, 13 Mar 2024 16:10:44 -0300
Labels: app=selenium-grid-selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.18.1-20240224
app.kubernetes.io/instance=selenium-grid
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-grid-selenium-chrome-node
app.kubernetes.io/version=4.18.1-20240224
controller-uid=bf32cbc7-b431-453c-bfac-5b29d45fc739
helm.sh/chart=selenium-grid-0.28.4
job-name=selenium-grid-selenium-chrome-node-wqv2b
scaledjob.keda.sh/name=selenium-grid-selenium-chrome-node
Annotations: checksum/event-bus-configmap: 2698802d0bbf358d1634b47dff1ef36c5fc2501a27a9d2eef02c7874eb9496f8
checksum/logging-configmap: 7f721b250f90c8a5877dc9217b97f0a14392b420edf0e5af105a60944d2b9dc3
checksum/node-configmap: 3b6c0fffa6e6a10d57e5455ce21e1e7ee55e0638f15ff521b8c96fe8c10d8e91
checksum/server-configmap: ac6520a86bfffa04b4946bbce02ac8f1be341d800f4ab09f4e7cf274f74d3770
cni.projectcalico.org/containerID: a4f6e5164e05ac11a7074e67b29d85a0c9f6a75e78cd1231c3673a965d6eb5f2
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/network-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.200.215"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Terminating (lasts 13h)
Termination Grace Period: 30s
IP: 172.30.200.215
IPs:
IP: 172.30.200.215
Controlled By: Job/selenium-grid-selenium-chrome-node-wqv2b
Containers:
selenium-grid-selenium-chrome-node:
Container ID: cri-o://e4c63c9a396b286577e6168d241485097be77263c6b782a59d6157ec52ebeb32
Image: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224
Image ID: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome@sha256:3b50643ff9885215c9142fefecace7b17efdde64a8c863ef702bd4a6c3e6a378
Port: 5555/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 13 Mar 2024 16:11:06 -0300
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 1Gi
Startup: exec [bash -c /opt/selenium/nodeProbe.sh Startup >> /proc/1/fd/1] delay=0s timeout=60s period=5s #success=1 #failure=12
Environment Variables from:
selenium-grid-selenium-event-bus ConfigMap Optional: false
selenium-grid-selenium-node-config ConfigMap Optional: false
selenium-grid-selenium-logging-config ConfigMap Optional: false
selenium-grid-selenium-server-config ConfigMap Optional: false
selenium-grid-selenium-secrets Secret Optional: false
Environment:
SE_OTEL_SERVICE_NAME: selenium-grid-selenium-chrome-node
SE_NODE_PORT: 5555
SE_NODE_REGISTER_PERIOD: 60
SE_NODE_REGISTER_CYCLE: 5
Mounts:
/dev/shm from dshm (rw)
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
/opt/selenium/nodeProbe.sh from selenium-grid-selenium-node-config (rw,path="nodeProbe.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7dv5t (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
selenium-grid-selenium-node-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: selenium-grid-selenium-node-config
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-7dv5t:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 20m (x27 over 13h) kubelet Stopping container selenium-grid-selenium-chrome-node
Warning FailedPreStopHook 20m (x27 over 13h) kubelet PreStopHook failed
Warning FailedKillPod 20m (x26 over 13h) kubelet error killing pod: [failed to "KillContainer" for "selenium-grid-selenium-chrome-node" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "c4bffdc1-fc0b-46b2-9819-be49a0a47725" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
(pytestenv) alequint@Alexandres-MBP fvt-ibm-mas-manage % oc describe pod selenium-grid-selenium-chrome-node-wqv2b-pbk8g | grep stop
(pytestenv) alequint@Alexandres-MBP fvt-ibm-mas-manage % oc describe pod selenium-grid-selenium-chrome-node-wqv2b-pbk8g | grep Stop
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
Normal Killing 22m (x27 over 13h) kubelet Stopping container selenium-grid-selenium-chrome-node
Warning FailedPreStopHook 22m (x27 over 13h) kubelet PreStopHook failed
@VietND96 just to give some feedback: we did not notice any node hanging so far, after:
global.seleniumGrid.stdoutProbeLog=false
We will keep an eye on our next test runs anyway, but so far we had very good results compared with previous runs.
@VietND96 @Aymen-Ben-S noticed hanging in one of our environments. Follow some details:
List of Pods in selenium namespace sorted by Created Date:
Only the top chrome node is really runnning, the one below, even though it is running, it hung. Follow describe from this pod (noticed a startup probe warning, but that is all):
oc describe pod selenium-grid-selenium-chrome-node-qx8s5-4g2bg -n selenium
Name: selenium-grid-selenium-chrome-node-qx8s5-4g2bg
Namespace: selenium
Priority: 0
Node: 10.113.225.224/10.113.225.224
Start Time: Fri, 29 Mar 2024 14:28:54 -0300
Labels: app=selenium-grid-selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.18.1-20240224
app.kubernetes.io/instance=selenium-grid
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-grid-selenium-chrome-node
app.kubernetes.io/version=4.18.1-20240224
controller-uid=4b3c4ecc-c78a-446b-8128-6e67bb9957cc
helm.sh/chart=selenium-grid-0.28.4
job-name=selenium-grid-selenium-chrome-node-qx8s5
scaledjob.keda.sh/name=selenium-grid-selenium-chrome-node
Annotations: checksum/event-bus-configmap: 2698802d0bbf358d1634b47dff1ef36c5fc2501a27a9d2eef02c7874eb9496f8
checksum/logging-configmap: 7f721b250f90c8a5877dc9217b97f0a14392b420edf0e5af105a60944d2b9dc3
checksum/node-configmap: 3b6c0fffa6e6a10d57e5455ce21e1e7ee55e0638f15ff521b8c96fe8c10d8e91
checksum/server-configmap: ac6520a86bfffa04b4946bbce02ac8f1be341d800f4ab09f4e7cf274f74d3770
cni.projectcalico.org/containerID: 4004c884b70042cf36f09bf1be0b7b7b4ee1014df73f9d075c9bb4f4a6ad4d1b
cni.projectcalico.org/podIP: 172.30.103.212/32
cni.projectcalico.org/podIPs: 172.30.103.212/32
k8s.v1.cni.cncf.io/network-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.103.212"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.103.212"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 172.30.103.212
IPs:
IP: 172.30.103.212
Controlled By: Job/selenium-grid-selenium-chrome-node-qx8s5
Containers:
selenium-grid-selenium-chrome-node:
Container ID: cri-o://9b1c71402416095c52dadada09becb2fef04d27172052fee60b07421f810a197
Image: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224
Image ID: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome@sha256:3b50643ff9885215c9142fefecace7b17efdde64a8c863ef702bd4a6c3e6a378
Port: 5555/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 29 Mar 2024 14:28:56 -0300
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 1Gi
Startup: exec [bash -c /opt/selenium/nodeProbe.sh Startup ] delay=0s timeout=60s period=5s #success=1 #failure=12
Environment Variables from:
selenium-grid-selenium-event-bus ConfigMap Optional: false
selenium-grid-selenium-node-config ConfigMap Optional: false
selenium-grid-selenium-logging-config ConfigMap Optional: false
selenium-grid-selenium-server-config ConfigMap Optional: false
selenium-grid-selenium-secrets Secret Optional: false
Environment:
SE_OTEL_SERVICE_NAME: selenium-grid-selenium-chrome-node
SE_NODE_PORT: 5555
SE_NODE_REGISTER_PERIOD: 60
SE_NODE_REGISTER_CYCLE: 5
Mounts:
/dev/shm from dshm (rw)
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
/opt/selenium/nodeProbe.sh from selenium-grid-selenium-node-config (rw,path="nodeProbe.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h99rz (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
selenium-grid-selenium-node-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: selenium-grid-selenium-node-config
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-h99rz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24m default-scheduler Successfully assigned selenium/selenium-grid-selenium-chrome-node-qx8s5-4g2bg to 10.113.225.224
Normal AddedInterface 24m multus Add eth0 [172.30.103.212/32] from k8s-pod-network
Normal Pulled 24m kubelet Container image "docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224" already present on machine
Normal Created 24m kubelet Created container selenium-grid-selenium-chrome-node
Normal Started 24m kubelet Started container selenium-grid-selenium-chrome-node
Warning Unhealthy 24m kubelet Startup probe failed: 2024-03-29T17:29:00UTC [Probe.Startup] - Wait for the Node to report its status
Follow logs from that same pod:
oc logs -f selenium-grid-selenium-chrome-node-qx8s5-4g2bg -n selenium
2024-03-29 17:28:56,709 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2024-03-29 17:28:56,712 INFO RPC interface 'supervisor' initialized
2024-03-29 17:28:56,712 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-29 17:28:56,713 INFO supervisord started with pid 8
2024-03-29 17:28:57,716 INFO spawned: 'xvfb' with pid 9
2024-03-29 17:28:57,719 INFO spawned: 'vnc' with pid 10
2024-03-29 17:28:57,721 INFO spawned: 'novnc' with pid 11
2024-03-29 17:28:57,725 INFO spawned: 'selenium-node' with pid 12
2024-03-29 17:28:57,742 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
E: [pulseaudio] main.c: Daemon startup failed.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Appending Selenium options: --session-timeout 300
Appending Selenium options: --register-period 60
Appending Selenium options: --register-cycle 5
Appending Selenium options: --heartbeat-period 30
Appending Selenium options: --log-level INFO
Generating Selenium Config
Setting up SE_NODE_HOST...
Tracing is disabled
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-hub.selenium:4442"
subscribe = "tcp://selenium-grid-selenium-hub.selenium:4443"
[server]
port = "5555"
[node]
grid-url = "http://admin:admin@selenium-grid-selenium-hub.selenium:4444"
session-timeout = "300"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 1
max-sessions = 1
[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "122.0", "platformName": "Linux", "goog:chromeOptions": {"binary": "/usr/bin/google-chrome"}}'
max-sessions = 1
Starting Selenium Grid Node...
2024-03-29 17:28:58,819 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-29 17:28:58,819 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-29 17:28:58,819 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
17:28:59.716 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
17:28:59.799 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
17:29:00.494 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-grid-selenium-hub.selenium:4442 and tcp://selenium-grid-selenium-hub.selenium:4443
17:29:00.714 INFO [UnboundZmqEventBus.<init>] - Sockets created
17:29:01.728 INFO [UnboundZmqEventBus.<init>] - Event bus ready
17:29:02.106 INFO [NodeServer.createHandlers] - Reporting self as: http://172.30.103.212:5555
17:29:02.204 INFO [NodeOptions.getSessionFactories] - Detected 1 available processors
17:29:02.393 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "122.0","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
17:29:02.500 INFO [Node.<init>] - Binding additional locator mechanisms: relative
17:29:03.042 INFO [NodeServer$1.start] - Starting registration process for Node http://172.30.103.212:5555
17:29:03.043 INFO [NodeServer.execute] - Started Selenium node 4.18.1 (revision b1d3319b48): http://172.30.103.212:5555
17:29:03.106 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
17:29:03.600 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
17:29:06.692 INFO [LocalNode.checkSessionCount] - Draining Node, configured sessions value (1) has been reached.
17:29:06.703 INFO [LocalNode.newSession] - Session created by the Node. Id: 3ff9d1895a0d96e49cc1655a0908a244, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 122.0.6261.69, chrome: {chromedriverVersion: 122.0.6261.69 (81bc525b6a36..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:39059}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://admin:admin@selenium-g..., se:cdpVersion: 122.0.6261.69, se:vnc: ws://admin:admin@selenium-g..., se:vncEnabled: true, se:vncLocalAddress: ws://172.30.103.212:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
17:34:32.624 INFO [LocalNode.stopTimedOutSession] - Session id 3ff9d1895a0d96e49cc1655a0908a244 timed out, stopping...
17:34:32.755 INFO [SessionSlot.stop] - Stopping session 3ff9d1895a0d96e49cc1655a0908a244
17:34:32.758 INFO [LocalNode.stopTimedOutSession] - Node draining complete!
17:34:33.764 INFO [NodeServer.lambda$createHandlers$3] - Shutting down
2024-03-29 17:34:34,110 INFO exited: selenium-node (exit status 0; expected)
2024-03-29 17:34:34,111 WARN received SIGINT indicating exit request
2024-03-29 17:34:34,112 INFO waiting for xvfb, vnc, novnc to die
2024-03-29 17:34:36,115 INFO stopped: novnc (terminated by SIGTERM)
2024-03-29 17:34:37,118 INFO stopped: vnc (terminated by SIGTERM)
2024-03-29 17:34:37,119 INFO waiting for xvfb to die
2024-03-29 17:34:37,119 INFO stopped: xvfb (terminated by SIGTERM)
I have noticed also an error in keda-operator-metrics, take a look:
2024/03/29 09:37:07 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
I0329 09:37:07.382355 1 welcome.go:34] keda_metrics_adapter "msg"="Starting metrics server"
I0329 09:37:07.382406 1 welcome.go:35] keda_metrics_adapter "msg"="KEDA Version: 2.13.0"
I0329 09:37:07.382428 1 welcome.go:36] keda_metrics_adapter "msg"="Git Commit: 367fcd32f1e72be504ada53165d07c33d75fc0f7"
I0329 09:37:07.382448 1 welcome.go:37] keda_metrics_adapter "msg"="Go Version: go1.21.6"
I0329 09:37:07.382468 1 welcome.go:38] keda_metrics_adapter "msg"="Go OS/Arch: linux/amd64"
I0329 09:37:07.382520 1 welcome.go:39] keda_metrics_adapter "msg"="Running on Kubernetes 1.25" "version"={"major":"1","minor":"25","gitVersion":"v1.25.16+6df2177","gitCommit":"f5b7c3e8faedd51935d77828a5fc72c7540236f4","gitTreeState":"clean","buildDate":"2024-01-18T12:53:29Z","goVersion":"go1.19.13 X:strictfipsruntime","compiler":"gc","platform":"linux/amd64"}
I0329 09:37:07.382557 1 welcome.go:43] keda_metrics_adapter "msg"="WARNING: KEDA 2.13.0 hasn't been tested on Kubernetes v1.25.16+6df2177"
I0329 09:37:07.382682 1 welcome.go:44] keda_metrics_adapter "msg"="You can check recommended versions on https://keda.sh"
I0329 09:37:07.384318 1 main.go:125] keda_metrics_adapter "msg"="Connecting Metrics Service gRPC client to the server" "address"="keda-operator.selenium.svc.cluster.local:9666"
I0329 09:37:07.401590 1 provider.go:58] keda_metrics_adapter/provider "msg"="starting"
I0329 09:37:07.401682 1 main.go:259] keda_metrics_adapter "msg"="starting adapter..."
I0329 09:37:07.412258 1 client.go:88] keda_metrics_adapter/provider "msg"="Waiting for establishing a gRPC connection to KEDA Metrics Server"
I0329 09:37:07.412288 1 main.go:176] keda_metrics_adapter "msg"="starting /metrics server endpoint"
I0329 09:37:07.662522 1 secure_serving.go:213] Serving securely on [::]:6443
I0329 09:37:07.662721 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0329 09:37:07.662771 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0329 09:37:07.662838 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/certs/tls.crt::/certs/tls.key"
I0329 09:37:07.663031 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0329 09:37:07.663302 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/certs/ca.crt"
I0329 09:37:07.663460 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0329 09:37:07.663491 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0329 09:37:07.763034 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0329 09:37:07.764422 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
W0329 09:37:08.583147 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.selenium.svc.cluster.local:9666", ServerName: "keda-operator.selenium.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 172.21.71.118:9666: connect: connection refused"
W0329 09:37:10.631072 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.selenium.svc.cluster.local:9666", ServerName: "keda-operator.selenium.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 172.21.71.118:9666: connect: connection refused"
W0329 09:37:13.447047 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.selenium.svc.cluster.local:9666", ServerName: "keda-operator.selenium.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 172.21.71.118:9666: connect: connection refused"
W0329 09:37:16.775172 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.selenium.svc.cluster.local:9666", ServerName: "keda-operator.selenium.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 172.21.71.118:9666: connect: connection refused"
W0329 09:37:22.407104 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.selenium.svc.cluster.local:9666", ServerName: "keda-operator.selenium.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 172.21.71.118:9666: connect: connection refused"
I0329 09:37:29.938514 1 provider.go:66] keda_metrics_adapter/provider "msg"="Connection to KEDA Metrics Service gRPC server has been successfully established" "server"="keda-operator.selenium.svc.cluster.local:9666"
E0329 10:54:03.649759 1 provider.go:84] keda_metrics_adapter/provider "msg"="error converting Selector to Labels Map" "error"="invalid selector: [mas.ibm.com/instanceId]"
E0329 10:54:03.650386 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"invalid selector: [mas.ibm.com/instanceId]"}: invalid selector: [mas.ibm.com/instanceId]
http2: server: error reading preface from client 172.30.141.15:40478: read tcp 172.30.109.75:6443->172.30.141.15:40478: read: connection reset by peer
And here, just as a reference, the log from keda-opetarator (beginning):
2024/03/29 17:28:34 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
2024-03-29T17:28:35Z INFO setup Starting manager
2024-03-29T17:28:35Z INFO setup KEDA Version: 2.13.0
2024-03-29T17:28:35Z INFO setup Git Commit: 367fcd32f1e72be504ada53165d07c33d75fc0f7
2024-03-29T17:28:35Z INFO setup Go Version: go1.21.6
2024-03-29T17:28:35Z INFO setup Go OS/Arch: linux/amd64
2024-03-29T17:28:35Z INFO setup Running on Kubernetes 1.25 {"version": "v1.25.16+6df2177"}
2024-03-29T17:28:35Z INFO setup WARNING: KEDA 2.13.0 hasn't been tested on Kubernetes v1.25.16+6df2177
2024-03-29T17:28:35Z INFO setup You can check recommended versions on https://keda.sh
2024-03-29T17:28:35Z INFO starting server {"kind": "health probe", "addr": "[::]:8081"}
I0329 17:28:35.306819 1 leaderelection.go:250] attempting to acquire leader lease selenium/operator.keda.sh...
I0329 17:28:52.723265 1 leaderelection.go:260] successfully acquired lease selenium/operator.keda.sh
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2024-03-29T17:28:52Z INFO Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2024-03-29T17:28:52Z INFO Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2024-03-29T17:28:52Z INFO Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource", "source": "kind source: *v1alpha1.CloudEventSource"}
2024-03-29T17:28:52Z INFO Starting Controller {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2024-03-29T17:28:52Z INFO Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2024-03-29T17:28:52Z INFO cert-rotation starting cert rotator controller
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-03-29T17:28:52Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-03-29T17:28:52Z INFO Starting Controller {"controller": "cert-rotator"}
2024-03-29T17:28:52Z INFO cert-rotation no cert refresh needed
2024-03-29T17:28:52Z INFO cert-rotation certs are ready in /certs
2024-03-29T17:28:52Z INFO Starting workers {"controller": "cert-rotator", "worker count": 1}
2024-03-29T17:28:52Z INFO cert-rotation no cert refresh needed
2024-03-29T17:28:52Z ERROR cert-rotation Webhook not found. Unable to update certificate. {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "ValidatingWebhookConfiguration.admissionregistration.k8s.io \"keda-admission\" not found"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).ensureCerts
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:816
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).Reconcile
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:785
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2024-03-29T17:28:52Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-03-29T17:28:53Z INFO Starting workers {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource", "worker count": 1}
2024-03-29T17:28:53Z INFO Starting workers {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2024-03-29T17:28:53Z INFO Starting workers {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2024-03-29T17:28:53Z INFO Reconciling ScaledJob {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"selenium-grid-selenium-chrome-node","namespace":"selenium"}, "namespace": "selenium", "name": "selenium-grid-selenium-chrome-node", "reconcileID": "7d69bdf2-a351-4a66-82cb-6822d0b51e39"}
2024-03-29T17:28:53Z INFO Starting workers {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2024-03-29T17:28:53Z INFO Starting workers {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2024-03-29T17:28:53Z INFO RolloutStrategy: immediate, Deleting jobs owned by the previous version of the scaledJob {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"selenium-grid-selenium-chrome-node","namespace":"selenium"}, "namespace": "selenium", "name": "selenium-grid-selenium-chrome-node", "reconcileID": "7d69bdf2-a351-4a66-82cb-6822d0b51e39", "numJobsToDelete": 1}
2024-03-29T17:28:54Z INFO Initializing Scaling logic according to ScaledJob Specification {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"selenium-grid-selenium-chrome-node","namespace":"selenium"}, "namespace": "selenium", "name": "selenium-grid-selenium-chrome-node", "reconcileID": "7d69bdf2-a351-4a66-82cb-6822d0b51e39"}
2024-03-29T17:28:54Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of running Jobs": 0}
2024-03-29T17:28:54Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of pending Jobs ": 0}
2024-03-29T17:28:54Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Effective number of max jobs": 1}
2024-03-29T17:28:54Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of jobs": 1}
2024-03-29T17:28:54Z INFO scaleexecutor Created jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of jobs": 1}
2024-03-29T17:28:54Z INFO cert-rotation CA certs are injected to webhooks
2024-03-29T17:28:54Z INFO grpc_server Starting Metrics Service gRPC Server {"address": ":9666"}
2024-03-29T17:29:04Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of running Jobs": 1}
2024-03-29T17:29:04Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of pending Jobs ": 0}
2024-03-29T17:29:14Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of running Jobs": 1}
2024-03-29T17:29:14Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of pending Jobs ": 0}
.. and here some of the last lines:
2024-03-29T17:49:24Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of jobs": 0}
2024-03-29T17:49:24Z INFO scaleexecutor Created jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of jobs": 0}
2024-03-29T17:49:34Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of running Jobs": 2}
2024-03-29T17:49:34Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of pending Jobs ": 0}
2024-03-29T17:49:34Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Effective number of max jobs": 0}
2024-03-29T17:49:34Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of jobs": 0}
2024-03-29T17:49:34Z INFO scaleexecutor Created jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of jobs": 0}
@alequint, thanks for your continuous feedback. In chart 0.29.1, I just updated the preStop script with the expectation that it would prevent stuck somewhere.
Besides that, I also suspect K8s behavior, since when enabling the scaling type as job,
in deployment, terminationGracePeriodSeconds: 30
by default, which means Pod should be killed after the 30s expires.
Thanks @VietND96... we plan to upgrade our images to the most recent release and try again.
Considering we are using job
scaling type, do you think worth setting terminationGracePeriodSeconds
to a time higher than 30?
Hi @VietND96 , I'm having this problem to, but in k8s & with 0.30.0 chart, selenium/node-chrome:124.0 & selenium/video:ffmpeg-7.0-20240425 Relevant log output node:
Starting Selenium Grid Node...
2024-05-02 13:46:43,603 INFO spawned: 'browserleftoverscleanup' with pid 98
2024-05-02 13:46:43,604 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-02 13:46:43,604 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-02 13:46:43,604 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-02 13:46:43,672 INFO exited: browserleftoverscleanup (exit status 0; not expected)
13:46:44.110 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
13:46:44.175 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
13:46:45.677 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-grid-selenium-event-bus.selenium-test:4442 and tcp://selenium-grid-selenium-event-bus.selenium-test:4443
2024-05-02 13:46:45,683 INFO spawned: 'browserleftoverscleanup' with pid 115
2024-05-02 13:46:45,696 INFO exited: browserleftoverscleanup (exit status 0; not expected)
13:46:45.802 INFO [UnboundZmqEventBus.<init>] - Sockets created
13:46:46.805 INFO [UnboundZmqEventBus.<init>] - Event bus ready
13:46:46.913 INFO [NodeServer.createHandlers] - Reporting self as: https://10.233.121.210:5555
13:46:46.988 INFO [NodeOptions.getSessionFactories] - Detected 2 available processors
13:46:47.090 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "124.0","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
13:46:47.102 INFO [Node.<init>] - Binding additional locator mechanisms: relative
13:46:48.199 INFO [NodeServer$1.start] - Starting registration process for Node https://10.233.121.210:5555
13:46:48.200 INFO [NodeServer.execute] - Started Selenium node 4.20.0 (revision 866c76ca80): https://10.233.121.210:5555
13:46:48.265 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
13:46:49.088 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
2024-05-02 13:46:49,092 INFO spawned: 'browserleftoverscleanup' with pid 131
2024-05-02 13:46:49,105 INFO exited: browserleftoverscleanup (exit status 0; not expected)
2024-05-02 13:46:50,107 INFO gave up: browserleftoverscleanup entered FATAL state, too many start retries too quickly
13:46:52.512 INFO [LocalNode.newSession] - Session created by the Node. Id: 8dc50864ff272dae7876a994e5e72cf8, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 124.0.6367.78, chrome: {chromedriverVersion: 124.0.6367.78 (a087f2dd364d..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:7289}, grid:teamName: idevops, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://grid-test...., se:cdpVersion: 124.0.6367.78, se:vnc: wss://grid-test...., se:vncEnabled: true, se:vncLocalAddress: ws://10.233.121.210:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
13:52:17.194 INFO [LocalNode.stopTimedOutSession] - Session id 8dc50864ff272dae7876a994e5e72cf8 timed out, stopping...
13:52:17.616 INFO [SessionSlot.stop] - Stopping session 8dc50864ff272dae7876a994e5e72cf8
13:57:44.698 INFO [LocalNode.drain] - Firing node drain complete message
13:57:44.743 INFO [LocalNode.drain] - Firing node drain complete message
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
13:57:45.706 INFO [NodeServer.lambda$createHandlers$3] - Shutting down
2024-05-02 13:57:45,707 WARN received SIGTERM indicating exit request
2024-05-02 13:57:45,708 INFO waiting for xvfb, vnc, novnc, selenium-node to die
2024-05-02 13:57:46,162 INFO stopped: selenium-node (terminated by SIGTERM)
2024-05-02 13:57:47,164 INFO stopped: novnc (terminated by SIGTERM)
2024-05-02 13:57:48,167 INFO stopped: vnc (terminated by SIGTERM)
2024-05-02 13:57:49,169 INFO stopped: xvfb (terminated by SIGTERM)
Shutdown complete
video container:
Stopping to record video
Add to pipe a signal Uploading video to s3-selenium-video:selenium-video/video//8dc50864ff272dae7876a994e5e72cf8.mp4
Uploading /videos/8dc50864ff272dae7876a994e5e72cf8.mp4 to s3-selenium-video:selenium-video/video/
[out#0/mp4 @ 0x556b0c2a3c80] video:1380kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.445701%
frame= 4874 fps= 15 q=-1.0 Lsize= 1400kB time=00:05:24.86 bitrate= 35.3kbits/s speed= 1x
[libx264 @ 0x556b0c2a6280] frame I:20 Avg QP: 8.60 size: 15852
[libx264 @ 0x556b0c2a6280] frame P:4854 Avg QP:11.03 size: 226
[libx264 @ 0x556b0c2a6280] mb I I16..4: 100.0% 0.0% 0.0%
[libx264 @ 0x556b0c2a6280] mb P I16..4: 0.0% 0.0% 0.0% P16..4: 1.4% 0.0% 0.0% 0.0% 0.0% skip:98.5%
[libx264 @ 0x556b0c2a6280] coded y,uvDC,uvAC intra: 2.2% 2.5% 2.4% inter: 0.1% 1.2% 0.0%
[libx264 @ 0x556b0c2a6280] i16 v,h,dc,p: 92% 8% 0% 0%
[libx264 @ 0x556b0c2a6280] i8c dc,h,v,p: 95% 4% 1% 0%
[libx264 @ 0x556b0c2a6280] kb/s:34.77
Exiting normally, received signal 2.
Transferred: 1.367 MiB / 1.367 MiB, 100%, 0 B/s, ETA -
Checks: 1 / 1, 100%
Deleted: 1 (files), 0 (dirs)
Renamed: 1
Transferred: 1 / 1, 100%
Elapsed time: 0.2s
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
2024-05-02 13:57:44,959 WARN received SIGTERM indicating exit request
2024-05-02 13:57:44,960 INFO waiting for video-recording, video-ready, video-upload to die
Node API is not responding, exiting.
Waiting for force exit file to be consumed by external upload container
Uploader is ready to shutdown
2024-05-02 13:57:46,648 INFO stopped: video-upload (exit status 0)
Traceback (most recent call last):
File "/opt/bin/video_ready.py", line 19, in <module>
httpd.serve_forever()
File "/usr/lib/python3.10/socketserver.py", line 232, in serve_forever
ready = selector.select(poll_interval)
File "/usr/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
2024-05-02 13:57:46,672 INFO stopped: video-ready (terminated by SIGINT)
Waiting for force exit file to be consumed by external upload container
2024-05-02 13:57:48,649 INFO waiting for video-recording to die
Waiting for force exit file to be consumed by external upload container
Waiting for force exit file to be consumed by external upload container
Waiting for force exit file to be consumed by external upload container
2024-05-02 13:57:51,656 INFO waiting for video-recording to die
Waiting for force exit file to be consumed by external upload container
Ready to shutdown the recorder
Ready to shutdown the recorder
2024-05-02 13:57:52,711 WARN received SIGINT indicating exit request
2024-05-02 13:57:53,713 INFO stopped: video-recording (exit status 0)
2024-05-02 13:57:53,714 WARN received SIGINT indicating exit request
Shutdown complete
Stream closed EOF for selenium-test/selenium-grid-selenium-chrome-node-v124-5ff489654-m9bxd (video)
@Doofus100500, may I know some info. The previous version that worked before upgrading? The video recording output can be opened and viewable?
@VietND96
@VietND96 Going through different versions of images didn't lead to anything; the issue with video being recorded after sending the file remains, if I understood the problem correctly.
@Doofus100500, how about the K8s version that you are using? I tried v1.30.0 and saw that the video file was broken as you mentioned. I recheck with the docker-compose setup without autoscaling, and the video file is viewable.
@VietND96 K8s Rev: v1.25.6
It seems that this is not important.
@VietND96 Can you try in v1.30.0 with chart 0.29.1 ?
Sure, I am inprogress to reproduce and figure out the issue. Will have a patch if anything can be fixed or rollback ffmpeg version.
@VietND96 Interesting observation, the problem does not occur if there are many resources in the namespace and sessions do not wait in the queue. Maybe it will be useful for you
@Doofus100500, looks like rollback ffmpeg 6.1.1
, video files are stable. I also added video integrity checks to prevent unstable issue when upgrading dependencies. A new image tag and chart will be released soon
@Doofus100500, new image tag and chart 0.30.1
is out. I added tests to verify the video file integrity and saw it passed in CI. Can you try and confirm?
Btw, video file name now is able to set from language binding - https://github.com/SeleniumHQ/docker-selenium?tab=readme-ov-file#video-recording-with-dynamic-file-name-based-on-metadata-in-tests. Hope it can improve the traceability of artifacts
@VietND96 With the video files everything is good now, but the pods are still remain in the Terminating status.
Btw, video file name now is able to set from language binding
That's wonderful, thank you, it will be helpful!
@Doofus100500, can you try to enable global.seleniumGrid.stdoutProbeLog: true
, preStop script will print logs to kubectl logs
we can see something could get stuck there.
@VietND96
kubectl logs -f -l app=selenium-grid-selenium-chrome-node --all-containers=true --namespace selenium-test
2024-05-06 09:14:24,239 INFO exited: selenium-node (exit status 0; expected)
2024-05-06 09:14:24,239 WARN received SIGINT indicating exit request
2024-05-06 09:14:24,239 INFO waiting for xvfb, vnc, novnc to die
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
2024-05-06 09:14:27,243 INFO stopped: novnc (terminated by SIGTERM)
2024-05-06 09:14:27,243 INFO waiting for xvfb, vnc to die
2024-05-06 09:14:27,243 WARN received SIGTERM indicating exit request
2024-05-06 09:14:28,244 INFO stopped: vnc (terminated by SIGTERM)
2024-05-06 09:14:29,246 INFO stopped: xvfb (terminated by SIGTERM)
Shutdown complete
2024-05-06T09:09:56UTC [video.recorder] - Video recording in progress
2024-05-06T09:09:58UTC [video.recorder] - Video recording in progress
2024-05-06T09:09:59UTC [video.recorder] - Video recording in progress
2024-05-06T09:10:00UTC [video.recorder] - Video recording in progress
2024-05-06T09:10:01UTC [video.recorder] - Video recording in progress
2024-05-06T09:10:02UTC [video.recorder] - Video recording in progress
2024-05-06T09:10:03UTC [video.recorder] - Stopping to record video
2024-05-06T09:10:03UTC [video.recorder] - Add to pipe a signal Uploading video to s3-selenium-video:selenium-video/video//8c2a18c7b195c1a712118a6d68556f60.mp4
Uploading /videos/8c2a18c7b195c1a712118a6d68556f60.mp4 to s3-selenium-video:idevops-ops-selenium-video/video/
2024/05/06 09:10:03 Failed to create file system for "s3-selenium-video:idevops-ops-selenium-video/video/": didn't find section in config file
2024-05-06 09:13:33,515 INFO waiting for video-recording to die
2024-05-06T09:13:33UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:34UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:35UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:36,654 INFO waiting for video-recording to die
2024-05-06T09:13:36UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:37UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:38,747 WARN killing 'video-recording' (9) with SIGKILL
2024-05-06 09:13:39,749 INFO stopped: video-recording (terminated by SIGKILL)
Shutdown complete
08:53:32.686 INFO [SessionSlot.stop] - Stopping session 24f17200728ab42ea8628dbf10de07d8
08:53:33.881 INFO [LocalNode.newSession] - Session created by the Node. Id: 65b4e0dfed660ceab008a019b0b8f8a7, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 124.0.6367.118, chrome: {chromedriverVersion: 124.0.6367.91 (51df0e5e17a8..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:13055}, grid:teamName: idevops, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://grid-test.testkontur...., se:cdpVersion: 124.0.6367.118, se:vnc: wss://grid-test.testkontur...., se:vncEnabled: true, se:vncLocalAddress: ws://10.233.65.58:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
08:59:02.571 INFO [LocalNode.stopTimedOutSession] - Session id 65b4e0dfed660ceab008a019b0b8f8a7 timed out, stopping...
08:59:02.699 INFO [SessionSlot.stop] - Stopping session 65b4e0dfed660ceab008a019b0b8f8a7
08:59:03.772 INFO [LocalNode.newSession] - Session created by the Node. Id: 443851e31a2148d36dc790fe7423bde9, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 124.0.6367.118, chrome: {chromedriverVersion: 124.0.6367.91 (51df0e5e17a8..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:9937}, grid:teamName: idevops, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://grid-test.testkontur...., se:cdpVersion: 124.0.6367.118, se:vnc: wss://grid-test.testkontur...., se:vncEnabled: true, se:vncLocalAddress: ws://10.233.65.58:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
09:04:32.571 INFO [LocalNode.stopTimedOutSession] - Session id 443851e31a2148d36dc790fe7423bde9 timed out, stopping...
09:04:32.702 INFO [SessionSlot.stop] - Stopping session 443851e31a2148d36dc790fe7423bde9
09:04:33.921 INFO [LocalNode.newSession] - Session created by the Node. Id: 8c2a18c7b195c1a712118a6d68556f60, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 124.0.6367.118, chrome: {chromedriverVersion: 124.0.6367.91 (51df0e5e17a8..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:9075}, grid:teamName: idevops, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://grid-test.testkontur...., se:cdpVersion: 124.0.6367.118, se:vnc: wss://grid-test.testkontur...., se:vncEnabled: true, se:vncLocalAddress: ws://10.233.65.58:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
09:10:02.571 INFO [LocalNode.stopTimedOutSession] - Session id 8c2a18c7b195c1a712118a6d68556f60 timed out, stopping...
09:10:02.692 INFO [SessionSlot.stop] - Stopping session 8c2a18c7b195c1a712118a6d68556f60
And in k9s in video container i see:
2024-05-06T09:12:45UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:46UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:47UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:48UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:49UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:50UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:51UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:52UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:53UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:54UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:55UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:56UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:57UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:58UTC [video.recorder] - Video recording in progress
2024-05-06T09:12:59UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:00UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:01UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:02UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:04UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:05UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:06UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:07UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:08UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:09UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:10UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:11UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:12UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:13UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:14UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:15UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:16UTC [video.recorder] - Video recording in progress
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
2024-05-06 09:13:17,640 WARN received SIGTERM indicating exit request
2024-05-06 09:13:17,640 INFO waiting for video-recording, video-ready, video-upload to die
2024-05-06T09:13:17UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:18UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:19UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:20,781 INFO waiting for video-recording, video-ready, video-upload to die
2024-05-06T09:13:20UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:21UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:22UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:23,924 INFO waiting for video-recording, video-ready, video-upload to die
2024-05-06T09:13:23UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:25UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:26UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:27,082 INFO waiting for video-recording, video-ready, video-upload to die
2024-05-06T09:13:27UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:28,144 WARN killing 'video-upload' (11) with SIGKILL
2024-05-06T09:13:28UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:28,189 INFO stopped: video-upload (terminated by SIGKILL)
Traceback (most recent call last):
File "/opt/bin/video_ready.py", line 19, in <module>
httpd.serve_forever()
File "/usr/lib/python3.10/socketserver.py", line 232, in serve_forever
ready = selector.select(poll_interval)
File "/usr/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
2024-05-06 09:13:28,200 INFO stopped: video-ready (terminated by SIGINT)
2024-05-06T09:13:29UTC [video.recorder] - Stopping to record video
2024-05-06T09:13:29UTC [video.recorder] - Add to pipe a signal Uploading video to s3-selenium-video:selenium-video/video//177aeb02fd19ec3fecd70775f045f4de.mp4
2024-05-06T09:13:29UTC [video.recorder] - Waiting for force exit file to be consumed by external upload container
2024-05-06 09:13:30,198 INFO waiting for video-recording to die
2024-05-06T09:13:30UTC [video.recorder] - Ready to shutdown the recorder
2024-05-06T09:13:30UTC [video.recorder] - Session: 177aeb02fd19ec3fecd70775f045f4de is created
2024-05-06 09:13:30,252 WARN received SIGINT indicating exit request
2024-05-06T09:13:30UTC [video.recorder] - Start recording: true, video file name: 177aeb02fd19ec3fecd70775f045f4de.mp4
{
"data": {
"session": {
"id": "177aeb02fd19ec3fecd70775f045f4de",
"capabilities": "{\n \"acceptInsecureCerts\": false,\n \"browserName\": \"chrome\",\n \"browserVersion\": \"124.0.6367.118\",\n \"chrome\": {\n \"chromedriverVersion\": \"124.0.6367.91 (51df0e5e17a8b0a4f281c1665dbd1b8a0c6b46af-refs\\u002fbranch-heads\\u002f6367@{#984})\",\n \"userDataDir\": \"\\u002ftmp\\u002f.org.chromium.Chromium.1bHXJ5\"\n },\n \"fedcm:accounts\": true,\n \"goog:chromeOptions\": {\n \"debuggerAddress\": \"localhost:27607\"\n },\n \"grid:teamName\": \"idevops\",\n \"networkConnectionEnabled\": false,\n \"pageLoadStrategy\": \"normal\",\n \"platformName\": \"linux\",\n \"proxy\": {\n },\n \"se:bidiEnabled\": false,\n \"se:cdp\": \"wss:\\u002f\\u002fgrid-test.testkontur.ru\\u002fstaging\\u002fsession\\u002f177aeb02fd19ec3fecd70775f045f4de\\u002fse\\u002fcdp\",\n \"se:cdpVersion\": \"124.0.6367.118\",\n \"se:vnc\": \"wss:\\u002f\\u002fgrid-test.testkontur.ru\\u002fstaging\\u002fsession\\u002f177aeb02fd19ec3fecd70775f045f4de\\u002fse\\u002fvnc\",\n \"se:vncEnabled\": true,\n \"se:vncLocalAddress\": \"ws:\\u002f\\u002f10.233.108.59:7900\",\n \"setWindowRect\": true,\n \"strictFileInteractability\": false,\n \"timeouts\": {\n \"implicit\": 0,\n \"pageLoad\": 300000,\n \"script\": 30000\n },\n \"unhandledPromptBehavior\": \"dismiss and notify\",\n \"webauthn:extension:credBlob\": true,\n \"webauthn:extension:largeBlob\": true,\n \"webauthn:extension:minPinLength\": true,\n \"webauthn:extension:prf\": true,\n \"webauthn:virtualAuthenticators\": true\n}",
"startTime": "06/05/2024 09:08:54",
"uri": "https://10.233.108.59:5555",
"nodeId": "3daa4aa3-1410-41c5-8923-2ffed4cf00f2",
"nodeUri": "https://10.233.108.59:5555",
"sessionDurationMillis": "276345",
"slot": {
"id": "9e59f70d-5bc7-421b-a73c-de2e1e72740a",
"stereotype": "{\n \"browserName\": \"chrome\",\n \"browserVersion\": \"124.0\",\n \"goog:chromeOptions\": {\n \"binary\": \"\\u002fusr\\u002fbin\\u002fgoogle-chrome\"\n },\n \"platformName\": \"linux\",\n \"se:noVncPort\": 7900,\n \"se:vncEnabled\": true\n}",
"lastStarted": "06/05/2024 09:08:54"
}
}
}
}
2024-05-06T09:13:30UTC [video.recorder] - Starting to record video
2024-05-06T09:13:30UTC [video.recorder] - Video recording started
[x11grab @ 0x55fbeb7e3900] Stream #0: not enough frames to estimate rate; consider increasing probesize
2024-05-06T09:13:32UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:33,515 INFO waiting for video-recording to die
2024-05-06T09:13:33UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:34UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:35UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:36,654 INFO waiting for video-recording to die
2024-05-06T09:13:36UTC [video.recorder] - Video recording in progress
2024-05-06T09:13:37UTC [video.recorder] - Video recording in progress
2024-05-06 09:13:38,747 WARN killing 'video-recording' (9) with SIGKILL
2024-05-06 09:13:39,749 INFO stopped: video-recording (terminated by SIGKILL)
Shutdown complete
Stream closed EOF for selenium-test/selenium-grid-selenium-chrome-node-78c7fb5788-9gpz6 (video)
Looks interesting; the record container was terminated by SIGKILL. Meanwhile, the Node container continues with a few sessions. Let me check on this further
I'm seeing this issue on 0.30.1. I set global.seleniumGrid.stdoutProbeLog
, but the only thing in the log from PreStop is
[lifecycle.preStop] - Detected full distributed mode: true. Since SE_DISTRIBUTOR_HOST and SE_DISTRIBUTOR_PORT are set in Node ConfigMap
[lifecycle.preStop] - Node is already drained. Shutting down gracefully!
[lifecycle.preStop] - Exiting Node preStop...
Followed by:
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
18:22:08.533 INFO [UnboundZmqEventBus.<init>] - Event bus ready
2024-05-15 18:22:08,534 WARN received SIGTERM indicating exit request
2024-05-15 18:22:08,535 INFO waiting for xvfb, vnc, novnc, selenium-node to die
18:22:08.623 INFO [NodeServer.createHandlers] - Reporting self as: http://10.244.2.178:5555
Exception in thread "Thread-0" 2024-05-15 18:22:08,628 INFO stopped: selenium-node (terminated by SIGTERM)
2024-05-15 18:22:09,630 INFO stopped: novnc (terminated by SIGTERM)
2024-05-15 18:22:10,632 INFO stopped: vnc (terminated by SIGTERM)
2024-05-15 18:22:11,635 INFO stopped: xvfb (terminated by SIGTERM)
Shutdown complete
But the pod never actually terminates.
We're still facing this issue as well, any updates on this?
We have the same problem.
We have made a preliminary analysis of the problem and the problem with closing the video container is only when an error occurs in the tests. When the test is completed correctly, everything works fine.
After changing videoRecorder: enabled: false the problem disappears, but this solution does not suit us. At this point we do not have recordings from the tests.
Do you have a fix for this?
Recently, there was an update in video container termination. Can you try the new video image tag and confirm?
I updated to 4.23.1 and the problems started. I went back to 4.18.1 and it works fine.
Ah ok, let me try to reproduce it based on your analysis and provide the fix
@barkep, can you provide few info in your setup
Autoscaling: job. Job is stuck with status NotReady 1/2. Now, I can't check describe pod, because I've downgraded to lower version.
Yes, right, I could reproduce the status NotReady 1/2. However, after 30s (via terminationGracePeriodSeconds: 30
), it could move to Completed and disappear. I am still checking further.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
We have an consistent behavior where Chrome nodes get stuck on Terminating state.
I'm not sure I can provide the exact steps to reproduce but I'm happy to share logs from a system where this is happening.
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
Openshift 4.12.49
Docker Selenium version (image tag)
4.18.1
Selenium Grid chart version (chart version)
0.28.4