Closed bschreder closed 8 months ago
@bschreder, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
It looks like you are using an umbrella chart where the selenium-grid is the chart dependency. With the default value using tpl
and point to $.Values.nodeConfigMap.
, will not correct any of YAML config level of umbrella charts.
Let me try to fix this soon
Ideally, script will be mounted via ConfigMap, so no need further steps to copy, just an issue with chart template
@bschreder, chart 0.28.1
is out, can you please check and confirm?
@VietND96 still same issue with 0.28.1
version
May I know in your chart, all default are using, or there is overriding, e.g different node config map is set?
An alternative config allows switch back to the default startup probe method httpGet
by setting global.seleniumGrid.defaultNodeStartupProbe
is httpGet
or leave it blank in your own override YAML
One help, can you dry run helm template
your solution chart and attach YAML rendered in selenium-grid/templates/node-configmap.yaml
and selenium-grid/templates/chrome-node-deployment.yaml
(or selenium-grid/templates/chrome-node-scaledjobs.yaml
) for me to understand how it is when it is imported as chart dependency?
Ok, I'm able to reproduce with a dummy chart and import selenium-grid as chart dependency. The scripts in selenium-grid/configs
are not loaded by default
global.seleniumGrid.defaultNodeStartupProbe
with httpGet
resolved the issue, thanks @VietND96
One help, can you dry run
helm template
your solution chart and attach YAML rendered inselenium-grid/templates/node-configmap.yaml
andselenium-grid/templates/chrome-node-deployment.yaml
(orselenium-grid/templates/chrome-node-scaledjobs.yaml
) for me to understand how it is when it is imported as chart dependency?
@VietND96, Thanks for making all these recent changes. I know our QA team is looking forward to adding these new capabilities to our test runs.
Our setup makes small changes to the selenium-grid. The main change is we use our own ingress controller (disable selenium-grid ingress), override child chart values and add annotations for affinity.
I also noticed that the .Release.Name defaults to 'release-name' since I didn't set it in the helm template command. This value is correctly provided to the subcharts.
I ran the template command before setting the defaultNodeStartupProbe to httpGet. Is this a short term change or something I'll need to keep as long as I use selenium-grid as a subchart?
For the files below, I removed our ingress controller and used the selenium-grid ingress. I also removed our affinity and tolerance specifications. I hope this makes it easier to discuss.
template command: helm template selenium-e2e -n mynamespace -f .\myvalues.yaml --debug
# Source: selenium-e2e/values.yaml
global:
seleniumGrid:
logLevel: FINE
imageTag: latest
nodesImageTag: latest
videoImageTag: latest
uploaderImageTag: latest
selenium-grid:
basicAuth:
enabled: false
isolateComponents: true
ingress:
fullname: "project-e2e"
className: "project-nginx"
hostname: <hostname.cloudapp.azure.com>
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/use-regex: "true"
nginx.ingress.kubernetes.io/rewrite-target: /$2
nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
chromeNode:
replicas: 1
extraEnvironmentVariables:
- name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
value: "true"
firefoxNode:
replicas: 1
extraEnvironmentVariables:
- name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
value: "true"
edgeNode:
replicas: 1
extraEnvironmentVariables:
- name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
value: "true"
# Source: myvalues.yaml
global:
seleniumGrid:
imageTag: latest
nodesImageTag: latest
selenium-grid:
ingress:
fullname: project
path: "/project-e2e(/|$)(.*)"
chromeNode:
replicas: 11
firefoxNode:
replicas: 11
edgeNode:
replicas: 11
# Source: selenium-e2e/charts/selenium-grid/templates/node-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: release-name-selenium-node-config
namespace: mynamespace
labels:
app.kubernetes.io/managed-by: helm
app.kubernetes.io/instance: release-name
app.kubernetes.io/version: 4.18.0-20240220
app.kubernetes.io/component: selenium-grid-4.18.0-20240220
helm.sh/chart: selenium-grid-0.28.1
data:
SE_DISTRIBUTOR_HOST: 'release-name-selenium-distributor.mynamespace'
SE_DISTRIBUTOR_PORT: '5553'
SE_ROUTER_HOST: 'release-name-selenium-router.mynamespace'
SE_ROUTER_PORT: '4444'
SE_DRAIN_AFTER_SESSION_COUNT: '0'
SE_NODE_GRID_URL: 'http://hostname.cloudapp.azure.com'
SE_NODE_GRID_GRAPHQL_URL: 'http://release-name-selenium-router.mynamespace:4444/graphql'
# Source: selenium-e2e/charts/selenium-grid/templates/chrome-node-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: release-name-selenium-chrome-node
namespace: mynamespace
labels:
app: release-name-selenium-chrome-node
app.kubernetes.io/name: release-name-selenium-chrome-node
app.kubernetes.io/managed-by: helm
app.kubernetes.io/instance: release-name
app.kubernetes.io/version: 4.18.0-20240220
app.kubernetes.io/component: selenium-grid-4.18.0-20240220
helm.sh/chart: selenium-grid-0.28.1
spec:
replicas: 11
selector:
matchLabels:
app: release-name-selenium-chrome-node
app.kubernetes.io/instance: release-name
template:
metadata:
labels:
app: release-name-selenium-chrome-node
app.kubernetes.io/name: release-name-selenium-chrome-node
app.kubernetes.io/managed-by: helm
app.kubernetes.io/instance: release-name
app.kubernetes.io/version: 4.18.0-20240220
app.kubernetes.io/component: selenium-grid-4.18.0-20240220
helm.sh/chart: selenium-grid-0.28.1
annotations:
checksum/event-bus-configmap: dd5f7b58820d8464fba0c8eb263c7359fed60bc1eb2a3df4b1c70c20c9823f0b
spec:
serviceAccountName: release-name-selenium-serviceaccount
serviceAccount: release-name-selenium-serviceaccount
restartPolicy: Always
containers:
- name: release-name-selenium-chrome-node
image: selenium/node-chrome:latest
imagePullPolicy: IfNotPresent
env:
- name: SE_OTEL_SERVICE_NAME
value: "release-name-selenium-chrome-node"
- name: SE_NODE_PORT
value: "5555"
- name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
value: "true"
envFrom:
- configMapRef:
name: release-name-selenium-event-bus
- configMapRef:
name: release-name-selenium-node-config
- configMapRef:
name: release-name-selenium-logging-config
- configMapRef:
name: release-name-selenium-server-config
- secretRef:
name: release-name-selenium-secrets
ports:
- containerPort: 5555
protocol: TCP
volumeMounts:
- name: dshm
mountPath: /dev/shm
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
lifecycle:
preStop:
exec:
command:
- bash
- -c
- /opt/selenium/nodePreStop.sh >> /proc/1/fd/1
startupProbe:
exec:
command: ["bash", "-c", "/opt/selenium/nodeProbe.sh >> /proc/1/fd/1"]
failureThreshold: 25
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 125
terminationGracePeriodSeconds: 30
volumes:
- name: release-name-selenium-node-config
configMap:
name: release-name-selenium-node-config
defaultMode: 493
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 1Gi
Ok, I'm able to reproduce with a dummy chart and import selenium-grid as chart dependency. The scripts in
selenium-grid/configs
are not loaded by default
I see the scripts in the config directory extracted from the .tgz file.
I see line 27+ of node-configmap.yaml that loops over extraScripts, but I'm not seeing the $.Files.Get
statement that would match the $.Files.Glob
statement if $value
is empty. I'm also not seeing the script reference in the output of my config file above.
In the node-deployment file above, I see the volumeMounts and volumes specified. I think this is good.
I hope this helps.
I tried to read few docs and issues report to Helm. Looks like .Files.Glob
in a chart could not load default files itself when it is imported as a sub-chart
Thank you for your use case report. Let me try to see if any workaround can be applied and give a patch soon.
In the meantime, please keep global.seleniumGrid.defaultNodeStartupProbe: httpGet
in your chart.
Chart 0.28.2
is out with the fix for this.
RCA: $.Files.Glob
in range
didn't work properly when the chart is imported as a sub-chart
Added template tests for case sub-chart to guard regression changes.
@VietND96 thanks for the fix.
I still see the same error when using 0.28.2 and 0.28.3
@CameronWard301, may I know your values used? do you override nodeConfigMap with your own? Since we have a template test to confirm it works
Hi,
Getting below error when installed 4.18.1-20240224 Version on my Kubernetes Cluster (v1.23.1).
chrome-node, edge-node and firefox-node not turning to ready status. Showing (0/1)
Error Message
Warning Unhealthy 16m kubelet Startup probe failed:
Warning Unhealthy 119s (x7 over 14m) kubelet Startup probe failed: command "bash -c /opt/selenium/nodeProbe.sh >> /proc/1/fd/1" **timed out**
Getting below error When I tried
kubectl exec -it selenium-grid-selenium-chrome-node-559c45f49-lfrfl sh -n tool-selenium-np
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ ls -la /opt/selenium/nodeProbe.sh
-rwxr-xr-x 1 root root 1886 Mar 1 12:41 /opt/selenium/nodeProbe.sh
$ ls -la /proc/1/fd/1
l-wx------ 1 seluser seluser 64 Mar 1 12:41 /proc/1/fd/1 -> 'pipe:[654232714]'
$ bash -c /opt/selenium/nodeProbe.sh >> /proc/1/fd/1
jq: error: Could not open file /tmp/gridProbe23941: No such file or directory
$
Can anyone help. Am I missing anything here?
@kanthasamyraja, the idea of script is getting Grid status via SE_NODE_GRID_URL
env var set in Node and check the NodeId is registered successfully.
What is it value in your deployment? If you exec into pod, can you try a cURL command to see what is the response?
curl -sfk "${SE_NODE_GRID_URL}/status"
Thanks @VietND96 for your response.
No response. Waiting..
$ hostname -f
selenium-grid-selenium-chrome-node-764df6d85f-ncbrz
$ env | grep SE_NODE_GRID_URL
SE_NODE_GRID_URL=http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1
$ curl -sfk "${SE_NODE_GRID_URL}/status"
Pod status
NAME READY STATUS RESTARTS AGE
grid-selenium-event-bus-84bbdb7b96-k99gw 1/1 Running 0 3m41s
selenium-grid-selenium-chrome-node-764df6d85f-ncbrz 0/1 Running 0 3m41s
selenium-grid-selenium-distributor-5ff8fb8dd9-ffcv4 1/1 Running 0 3m41s
selenium-grid-selenium-edge-node-76c4596df6-vmpmn 0/1 Running 0 3m41s
selenium-grid-selenium-firefox-node-7cb4bcf79d-cr9df 0/1 Running 0 3m41s
selenium-grid-selenium-router-77bcf5bbb8-dmzc5 1/1 Running 0 3m41s
selenium-grid-selenium-session-map-5dfd6b6ff9-9s6jr 1/1 Running 0 3m41s
selenium-grid-selenium-session-queue-7fc94584f6-nbwnj 1/1 Running 0 3m41s
Logs
kubectl logs selenium-grid-selenium-router-77bcf5bbb8-dmzc5 -n tool-selenium-np1
2024-03-01 14:05:20,962 INFO Included extra file "/etc/supervisor/conf.d/selenium-grid-router.conf" during parsing
2024-03-01 14:05:20,964 INFO RPC interface 'supervisor' initialized
2024-03-01 14:05:20,964 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-01 14:05:20,965 INFO supervisord started with pid 7
2024-03-01 14:05:21,967 INFO spawned: 'selenium-grid-router' with pid 8
Starting Selenium Grid Router...
2024-03-01 14:05:21,972 INFO success: selenium-grid-router entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Using SE_ROUTER_HOST: selenium-grid-selenium-router.tool-selenium-np1
Using SE_ROUTER_PORT: 4444
Appending Selenium options: --log-level INFO
Tracing is disabled
14:05:22.301 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
14:05:22.306 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
14:05:22.885 INFO [RouterServer.createHandlers] - Requiring authentication to connect
14:05:23.067 INFO [RouterServer.execute] - Started Selenium Router 4.18.1 (revision b1d3319b48): http://selenium-grid-selenium-router.tool-selenium-np1:4444
@CameronWard301, may I know your values used? do you override nodeConfigMap with your own? Since we have a template test to confirm it works
No I didn't override any values or config maps, I simply added it as a repository to my chart.yml and did helm upgrade
@kanthasamyraja, SE_NODE_GRID_URL
in the node is using the default rendered by chart, right?
If so let me check how it is missing port 4444 in the URL. Since from router logs, Started Selenium Router 4.18.1 (revision b1d3319b48): http://selenium-grid-selenium-router.tool-selenium-np1:4444
but the value in Node is only http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1
Can you also try the command curl -sfk http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1:4444
to see it can respond?
@CameronWard301, yes, so you are facing the same issue discussed above? The startup probe not return the correct value caused the pod ready 0/1 ?
@VietND96 Yes same issue with the startup probe
I had the same issue on 0.28.3, I fixed it by setting the defaultNodeStartupProbe
like @VietND96 suggested, but the syntax was quite confusing to me as I'm still learning how to use external charts. This is what I had to do:
selenium-grid:
global:
seleniumGrid:
defaultNodeStartupProbe: httpGet
Hi @VietND96
SE_NODE_GRID_URL in the node is using the default rendered by chart, right?
Used default installation.
Command Output
$ hostname -f
selenium-grid-selenium-chrome-node-764df6d85f-ncbrz
$ curl -sfk http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1:4444
$ curl -v http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1:4444
* Trying 10.99.2.140:4444...
* Connected to selenium-grid-selenium-router.tool-selenium-np1 (10.99.2.140) port 4444 (#0)
* Server auth using Basic with user 'admin'
> GET / HTTP/1.1
> Host: selenium-grid-selenium-router.tool-selenium-np1:4444
> Authorization: Basic YWRtaW46YWRtaW4=
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< content-length: 0
< Location: /ui
<
* Connection #0 to host selenium-grid-selenium-router.tool-selenium-np1 left intact
$
@kanthasamyraja, thank you for your input, I am investigating this case and give a possible fix soon
@kanthasamyraja, and all. If you are having a sandbox env for trial, can you check the nightly
chart to verify it works before bumping a new version
helm install selenium-grid docker-selenium/selenium-grid --version 1.0.0-nightly
Updated: chart 0.28.4 is out
Hi @VietND96
Working.
selenium-grid-selenium-chrome-node-5c4c64d498-6s2tb 1/1 Running 0 4m36s
selenium-grid-selenium-edge-node-776bdd8f59-69gzt 1/1 Running 1 (2m56s ago) 4m36s
selenium-grid-selenium-firefox-node-7ff98f797-99bxv 1/1 Running 0 4m36s
selenium-grid-selenium-hub-546fc7f864-984dj 1/1 Running 0 4m36s
k describe po selenium-grid-selenium-edge-node-776bdd8f59-69gzt
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m27s default-scheduler Successfully assigned default/selenium-grid-selenium-edge-node-776bdd8f59-69gzt to zdesk-devops14
Normal Pulling 4m26s kubelet Pulling image "selenium/node-edge:nightly"
Normal Pulled 3m49s kubelet Successfully pulled image "selenium/node-edge:nightly" in 36.982069453s
Warning Unhealthy 3m35s kubelet Startup probe failed: jq: error (at /tmp/gridProbe20868:1): Cannot iterate over null (null)
Normal Killing 2m52s kubelet Container selenium-grid-selenium-edge-node failed startup probe, will be restarted
Normal Pulled 2m47s kubelet Container image "selenium/node-edge:nightly" already present on machine
Normal Created 2m46s (x2 over 3m48s) kubelet Created container selenium-grid-selenium-edge-node
Normal Started 2m46s (x2 over 3m48s) kubelet Started container selenium-grid-selenium-edge-node
Warning Unhealthy 2m42s (x12 over 3m47s) kubelet Startup probe failed:
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
helm selenium-grid 0.28 release only
PR #2139 assumes that a script {{ $.Values.nodeConfigMap.extraScriptsDirectory }}/nodeProbe.sh that resolves to /opt/selenium/nodeProbe.sh
This file doesn't exist in the node-chrome:latest image ( imageTag: 4.18.0-20240220 )
I checked: 1) the node-chrome Dockerfile for a copy nodeProbe.sh statement 2) I checked the selenium repo for nodeProbe.sh script Neither of these avenues were successful.
I also noticed that the Edge and Firefox browsers had the same issue.
Work around: use selenium-grid version 0.27.0
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
Azure Kubernetes Service (AKS)
Docker Selenium version (image tag)
4.18.0-20240220
Selenium Grid chart version (chart version)
0.28.0