Closed AndreasPetersen closed 4 months ago
@AndreasPetersen, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
If I edit the chrome-node deployment on OpenShift directly and add the nodeConfigMap.extraScripts
:
volumeMounts:
- name: dshm
mountPath: /dev/shm
- name: selenium-grid-selenium-node-config
mountPath: /opt/selenium/nodePreStop.sh
subPath: nodePreStop.sh
- name: selenium-grid-selenium-node-config
mountPath: /opt/selenium/nodeProbe.sh
subPath: nodeProbe.sh
The files are now available. However, the pod still fails to start:
2024-04-15 11:41:39,056 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2024-04-15 11:41:39,059 INFO RPC interface 'supervisor' initialized
2024-04-15 11:41:39,059 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-04-15 11:41:39,059 INFO supervisord started with pid 8
2024-04-15 11:41:40,062 INFO spawned: 'xvfb' with pid 21
2024-04-15 11:41:40,064 INFO spawned: 'vnc' with pid 22
2024-04-15 11:41:40,066 INFO spawned: 'novnc' with pid 23
2024-04-15 11:41:40,068 INFO spawned: 'selenium-node' with pid 24
2024-04-15 11:41:40,073 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
E: [pulseaudio] main.c: Daemon startup failed.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Appending Selenium options: --session-timeout 600
Appending Selenium options: --register-period 60
Appending Selenium options: --register-cycle 5
Appending Selenium options: --heartbeat-period 30
Appending Selenium options: --log-level INFO
Generating Selenium Config
Setting up SE_NODE_HOST...
Tracing is enabled
Classpath will be enriched with these external jars : --ext /external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-otlp/1.34.1/opentelemetry-exporter-otlp-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/grpc/grpc-netty/1.61.0/grpc-netty-1.61.0.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-trace/1.34.1/opentelemetry-sdk-trace-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-metrics/1.34.1/opentelemetry-sdk-metrics-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-logs/1.34.1/opentelemetry-sdk-logs-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-otlp-common/1.34.1/opentelemetry-exporter-otlp-common-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-sender-okhttp/1.34.1/opentelemetry-exporter-sender-okhttp-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/o...
List arguments for OpenTelemetry: -Dotel.resource.attributes=service.name=selenium-grid-selenium-chrome-node -Dotel.traces.exporter=otlp -Dotel.exporter.otlp.endpoint=http://open-telemetry-collector:4317 -Dotel.java.global-autoconfigure.enabled=true
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-hub.staging:4442"
subscribe = "tcp://selenium-grid-selenium-hub.staging:4443"
[server]
port = "5555"
[node]
grid-url = "http://selenium-grid-selenium-hub.staging"
session-timeout = "600"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 0
max-sessions = 1
[[node.driver-configuration]]
Kubernetes reports:
Startup probe failed: 2024-04-15T11:39:09UTC [Probe.Startup] - Wait for the Node to report its status
Hello, I see you are using chart 0.29.1
but you are passing some old values via your own override YAML
Default value of these 2 key config is ""
But I saw your values.yaml shared
extraScripts:
nodePreStop.sh:
nodeProbe.sh:
Can you update your input YAML and retry? Can you also try the best practice is override YAML only contains config keys have value changed, don't clone the whole values.yaml from previous chart version
Okay, via list extraEnvironmentVariables
of chrome node, there is something relates to proxy
- name: http_proxy
value: "http://my-corp-proxy:8080"
- name: HTTP_PROXY
value: "$(http_proxy)"
- name: https_proxy
value: "$(http_proxy)"
- name: HTTPS_PROXY
value: "$(http_proxy)"
- name: no_proxy
value: "localhost"
- name: NO_PROXY
value: "$(no_proxy)"
I guess the image is rebuilt with some extra functional supports. So, the default script implemented for startup probe could not work properly behind the proxy in node.
So you can change value to httpGet
method in global.seleniumGrid.defaultNodeStartupProbe
Or modify the nodeProbe.sh in chart dir configs/node
Or tell me your scenario with those proxy set, then I update the default script to support better against different cases
Ok, I think cURL
command in nodeProbe.sh
needs to deal with the proxy is set. I will check and update
Hi @VietND96 . Thanks for the quick reply!
I think there are two issues:
The nodeProbe.sh
and nodePreStop.sh
are not mounted to the chrome-node using the default configuration, despite the fact that the default configuration expects these files to be present.
Once I add the nodeProbe.sh
, the script fails, possibly due to proxy as you mention.
We build an image with our corporate certificate using this Dockerfile:
FROM selenium/node-chrome:4.18.1-20240224
USER root
# Makes apt-get fetch through our corporate Artifactory
COPY ./sources.list /etc/apt
# Add certificate, install NSS Shared Database tools
COPY ./corp.crt /usr/local/share/ca-certificates
RUN update-ca-certificates && \
apt-get update && \
apt-get -y install libnss3-tools
user seluser
## Create NSS Shared Database that Chrome uses for certificates to trust
## and add the certificate
#RUN mkdir -p $HOME/.pki/nssdb && \
# certutil -d sql:$HOME/.pki/nssdb -N --empty-password && \
RUN certutil -d sql:$HOME/.pki/nssdb -A -t "CPTu,CPTu,CPTu" -n my-corp -i /usr/local/share/ca-certificates/corp.crt
user root
# Give permissions to nssdb so that Chrome can actually use it
RUN chmod -R 777 /home/seluser/.pki/nssdb
user seluser
We then also set the proxy environment variables as you stated. We need to have this proxy set, since some of the websites we test on can only be accessed through our corporate proxy. Others can only be accessed in our internal network, and these should not be requested through the proxy. There are a number of hosts that I excluded from the NO_PROXY
env var in my example, apart from localhost
.
Should I create a seperate issue for the second issue?
I have updated the cURL command in probes scripts to ignore proxy. On the issue probe scripts didn't mount correctly. Did you update your override YAML following this?
extraScripts:
nodePreStop.sh: ""
nodeProbe.sh: ""
If it doesn't work, please share the Helm command that you used
I install the chart using:
helm upgrade selenium-grid selenium-grid --install --repo my-corp-proxy-repo --version 0.29.1 --namespace staging -f values.yaml --wait
With the values.yaml
being what I wrote in my original post. I must have copied an older version of the values.yaml
of the helm chart, because I can see that in it that extraScripts
is missing the empty quoutes:
extraScripts:
nodePreStop.sh:
nodeProbe.sh:
I updated my values.yaml to:
global:
seleniumGrid:
# Image registry for all selenium components
imageRegistry: my-corp-proxy-repo/selenium
# Image tag for all selenium components
imageTag: 4.19.1-20240402
# Image tag for browser's nodes
nodesImageTag: 4.19.1-20240402
# Image tag for browser's video recorder
videoImageTag: ffmpeg-6.1-20240224
# Pull secret for all components, can be overridden individually
imagePullSecret: ""
# Log level for all components. Possible values describe here: https://www.selenium.dev/documentation/grid/configuration/cli_options/#logging
logLevel: INFO
# Basic auth settings for Selenium Grid
basicAuth:
# Enable or disable basic auth
enabled: false
# Username for basic auth
username: admin
# Password for basic auth
password: admin
# Deploy Router, Distributor, EventBus, SessionMap and Nodes separately
isolateComponents: false
# Service Account for all components
serviceAccount:
create: true
# nameOverride:
annotations: {}
# eks.amazonaws.com/role-arn: "arn:aws:iam::12345678:role/video-bucket-permissions"
# Configuration for selenium hub deployment (applied only if `isolateComponents: false`)
hub:
# imageRegistry: selenium
# Selenium Hub image name
imageName: hub
# Selenium Hub image tag (this overwrites global.seleniumGrid.imageTag parameter)
# imageTag: 4.18.1-20240224
# Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
imagePullPolicy: IfNotPresent
# Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
imagePullSecret: ""
# Custom environment variables for selenium-hub
extraEnvironmentVariables:
- name: SE_SESSION_REQUEST_TIMEOUT
value: "1800"
tracing:
enabled: false
enabledWithExistingEndpoint: true
exporter: otlp
exporterEndpoint: "http://open-telemetry-collector:4317"
globalAutoConfigure: true
ingress:
enabled: true
annotations:
paths:
- backend:
service:
name: "{{ .Release.Name }}-jaeger-query"
port:
number: 16686
path: &jaegerBasePath "/jaeger"
pathType: Prefix
monitoring:
enabled: false
# Keda scaled object configuration
autoscaling:
# Enable autoscaling. Implies installing KEDA
enabled: false
# Configuration for chrome nodes
chromeNode:
# Enable chrome nodes
enabled: true
resources:
requests:
memory: "4Gi"
cpu: "800m"
limits:
memory: "4Gi"
cpu: "1"
# NOTE: Only used when autoscaling.enabled is false
# Enable creation of Deployment
# true (default) - if you want long-living pods
# false - for provisioning your own custom type such as Jobs
deploymentEnabled: true
replicas: 1
imageRegistry: my-corp-proxy-repo
# Image of chrome nodes
imageName: node-chrome-bankdata-cert
# Image of chrome nodes (this overwrites global.seleniumGrid.nodesImageTag)
imageTag: 4.19.1-20240402
# Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
imagePullPolicy: IfNotPresent
extraEnvironmentVariables:
- name: SE_NODE_SESSION_TIMEOUT
value: "600"
- name: SCREEN_WIDTH
value: "1360"
- name: SCREEN_HEIGHT
value: "1080"
- name: http_proxy
value: "http://my-corp-proxy:8080"
- name: HTTP_PROXY
value: "$(http_proxy)"
- name: https_proxy
value: "$(http_proxy)"
- name: HTTPS_PROXY
value: "$(http_proxy)"
- name: no_proxy
value: "localhost,my-corp.dk"
- name: NO_PROXY
value: "$(no_proxy)"
- name: LANG
value: da_DK
- name: LANGUAGE
value: da_DK
# Size limit for DSH volume mounted in container (if not set, default is "1Gi")
dshmVolumeSizeLimit: 4Gi
# Configuration for firefox nodes
firefoxNode:
# Enable firefox nodes
enabled: false
# Configuration for edge nodes
edgeNode:
# Enable edge nodes
enabled: false
The extra scripts are now present. It of course still fails because of the proxy issue. Looking forward to the proxy fix. Thanks for the quick help @VietND96 !
Until a fix is released, I've worked around the issue by adding to my no_proxy
list 127.0.0.1,$SE_HUB_HOST
I will bump chart ver 0.29.2
soon today, since one more issue needs to be confirmed also.
I could not do the patch 0.29.2
as planned. Chart 0.30.0
will be released together with based Selenium 4.20.0
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
Chrome node fails to start due to:
When I check the
opt/selenium
directory on the node before Kubernetes kills it, I can see that indeed there is nonodeProbe.sh
.I can see that this was marked as fixed in https://github.com/SeleniumHQ/docker-selenium/issues/2141, but I'm still getting this with
0.29.1
.Command used to start Selenium Grid with Docker (or Kubernetes)
I'm installing the Selenium Grid Helm Chart version
0.29.1
with the following values-file:I can see the config map with the
nodeProbe.sh
is not mounted to the pod. Following is a snippet of YAML of the installed node:Relevant log output
Log of node-chrome:
OpenShift/Kubernetes reports:
and