SeleniumHQ / docker-selenium

Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Docker, making it easier to perform browser automation
http://www.selenium.dev/docker-selenium/
Other
7.98k stars 2.51k forks source link

[🐛 Bug]: helm charts try to execute /opt/selenium/nodeProbe.sh #2141

Closed bschreder closed 8 months ago

bschreder commented 9 months ago

What happened?

helm selenium-grid 0.28 release only

PR #2139 assumes that a script {{ $.Values.nodeConfigMap.extraScriptsDirectory }}/nodeProbe.sh that resolves to /opt/selenium/nodeProbe.sh

This file doesn't exist in the node-chrome:latest image ( imageTag: 4.18.0-20240220 )

I checked: 1) the node-chrome Dockerfile for a copy nodeProbe.sh statement 2) I checked the selenium repo for nodeProbe.sh script Neither of these avenues were successful.

I also noticed that the Edge and Firefox browsers had the same issue.

Work around: use selenium-grid version 0.27.0

Command used to start Selenium Grid with Docker (or Kubernetes)

I use selenium-grid as a subchart.

Chart.yaml file:
appVersion: 0.0.2
name: selenium-e2e
dependencies:
  - name: selenium-grid
    version: 0.28.0
    repository: https://www.selenium.dev/docker-selenium
type: application
description: A Helm chart for E2E testing using the Selenium Grid subchart
icon: https://github.com/SeleniumHQ/docker-selenium/raw/trunk/logo.png
version: 1.0.1

Helm command to upgrade cluster:
 helm upgrade selenium-e2e ./selenium-e2e-1.0.1.tgz  -n mynamespace --install --dependency-update -f .\myvalues.yaml

Relevant log output

From Event log

involvedObject:
  kind: Pod
  namespace: mynamespace
  name: selenium-e2e-selenium-chrome-node-796b8dcd47-82nt2
  uid: cf9dfd0a-4101-4b3a-ab7f-bf81362fdc62
  apiVersion: v1
  resourceVersion: '283167694'
  fieldPath: spec.containers{selenium-e2e-selenium-chrome-node}
reason: Unhealthy
message: >
  Startup probe failed: bash: line 1: /opt/selenium/nodeProbe.sh: No such file
  or directory
firstTimestamp: '2024-02-21T00:18:02Z'
lastTimestamp: '2024-02-21T00:18:17Z'
count: 4

Operating System

Azure Kubernetes Service (AKS)

Docker Selenium version (image tag)

4.18.0-20240220

Selenium Grid chart version (chart version)

0.28.0

github-actions[bot] commented 9 months ago

@bschreder, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

VietND96 commented 9 months ago

It looks like you are using an umbrella chart where the selenium-grid is the chart dependency. With the default value using tpl and point to $.Values.nodeConfigMap., will not correct any of YAML config level of umbrella charts. Let me try to fix this soon

Ideally, script will be mounted via ConfigMap, so no need further steps to copy, just an issue with chart template

VietND96 commented 9 months ago

@bschreder, chart 0.28.1 is out, can you please check and confirm?

Zeeshan50522 commented 9 months ago

@VietND96 still same issue with 0.28.1 version

VietND96 commented 9 months ago

May I know in your chart, all default are using, or there is overriding, e.g different node config map is set?

VietND96 commented 9 months ago

An alternative config allows switch back to the default startup probe method httpGet by setting global.seleniumGrid.defaultNodeStartupProbe is httpGet or leave it blank in your own override YAML

VietND96 commented 9 months ago

One help, can you dry run helm template your solution chart and attach YAML rendered in selenium-grid/templates/node-configmap.yaml and selenium-grid/templates/chrome-node-deployment.yaml (or selenium-grid/templates/chrome-node-scaledjobs.yaml) for me to understand how it is when it is imported as chart dependency?

VietND96 commented 9 months ago

Ok, I'm able to reproduce with a dummy chart and import selenium-grid as chart dependency. The scripts in selenium-grid/configs are not loaded by default

Zeeshan50522 commented 9 months ago

global.seleniumGrid.defaultNodeStartupProbe with httpGet resolved the issue, thanks @VietND96

bschreder commented 8 months ago

One help, can you dry run helm template your solution chart and attach YAML rendered in selenium-grid/templates/node-configmap.yaml and selenium-grid/templates/chrome-node-deployment.yaml (or selenium-grid/templates/chrome-node-scaledjobs.yaml) for me to understand how it is when it is imported as chart dependency?

@VietND96, Thanks for making all these recent changes. I know our QA team is looking forward to adding these new capabilities to our test runs.

Our setup makes small changes to the selenium-grid. The main change is we use our own ingress controller (disable selenium-grid ingress), override child chart values and add annotations for affinity.

I also noticed that the .Release.Name defaults to 'release-name' since I didn't set it in the helm template command. This value is correctly provided to the subcharts.

I ran the template command before setting the defaultNodeStartupProbe to httpGet. Is this a short term change or something I'll need to keep as long as I use selenium-grid as a subchart?

For the files below, I removed our ingress controller and used the selenium-grid ingress. I also removed our affinity and tolerance specifications. I hope this makes it easier to discuss.

template command: helm template selenium-e2e -n mynamespace -f .\myvalues.yaml --debug


# Source: selenium-e2e/values.yaml
global: 
  seleniumGrid:
    logLevel: FINE
    imageTag: latest
    nodesImageTag: latest
    videoImageTag: latest
    uploaderImageTag: latest

selenium-grid:
  basicAuth:
    enabled: false
  isolateComponents: true
  ingress:
    fullname: "project-e2e"
    className: "project-nginx"
    hostname: <hostname.cloudapp.azure.com>
    annotations: 
      nginx.ingress.kubernetes.io/proxy-body-size: "50m"
      nginx.ingress.kubernetes.io/use-regex: "true"
      nginx.ingress.kubernetes.io/rewrite-target: /$2
      nginx.ingress.kubernetes.io/proxy-connect-timeout: "300"
      nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
      nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
  chromeNode:
    replicas: 1
    extraEnvironmentVariables:
      - name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
        value: "true"
  firefoxNode:
    replicas: 1
    extraEnvironmentVariables:
      - name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
        value: "true"
  edgeNode:
    replicas: 1
    extraEnvironmentVariables:
      - name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
        value: "true"

# Source: myvalues.yaml
global:
  seleniumGrid:
    imageTag: latest
    nodesImageTag: latest
selenium-grid:
  ingress:
    fullname: project
    path: "/project-e2e(/|$)(.*)"
  chromeNode:
    replicas: 11
  firefoxNode:
    replicas: 11
  edgeNode:
    replicas: 11

# Source: selenium-e2e/charts/selenium-grid/templates/node-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: release-name-selenium-node-config
  namespace: mynamespace
  labels:
    app.kubernetes.io/managed-by: helm
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: 4.18.0-20240220
    app.kubernetes.io/component: selenium-grid-4.18.0-20240220
    helm.sh/chart: selenium-grid-0.28.1
data:
  SE_DISTRIBUTOR_HOST: 'release-name-selenium-distributor.mynamespace'
  SE_DISTRIBUTOR_PORT: '5553'
  SE_ROUTER_HOST: 'release-name-selenium-router.mynamespace'
  SE_ROUTER_PORT: '4444'
  SE_DRAIN_AFTER_SESSION_COUNT: '0'
  SE_NODE_GRID_URL: 'http://hostname.cloudapp.azure.com'
  SE_NODE_GRID_GRAPHQL_URL: 'http://release-name-selenium-router.mynamespace:4444/graphql'

# Source: selenium-e2e/charts/selenium-grid/templates/chrome-node-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: release-name-selenium-chrome-node
  namespace: mynamespace
  labels:
    app: release-name-selenium-chrome-node
    app.kubernetes.io/name: release-name-selenium-chrome-node
    app.kubernetes.io/managed-by: helm
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: 4.18.0-20240220
    app.kubernetes.io/component: selenium-grid-4.18.0-20240220
    helm.sh/chart: selenium-grid-0.28.1
spec:
  replicas: 11

  selector:
    matchLabels:
      app: release-name-selenium-chrome-node
      app.kubernetes.io/instance: release-name
  template:
    metadata:
      labels:
        app: release-name-selenium-chrome-node
        app.kubernetes.io/name: release-name-selenium-chrome-node
        app.kubernetes.io/managed-by: helm
        app.kubernetes.io/instance: release-name
        app.kubernetes.io/version: 4.18.0-20240220
        app.kubernetes.io/component: selenium-grid-4.18.0-20240220
        helm.sh/chart: selenium-grid-0.28.1
      annotations:
        checksum/event-bus-configmap: dd5f7b58820d8464fba0c8eb263c7359fed60bc1eb2a3df4b1c70c20c9823f0b
    spec:
      serviceAccountName: release-name-selenium-serviceaccount
      serviceAccount: release-name-selenium-serviceaccount
      restartPolicy: Always
      containers:
        - name: release-name-selenium-chrome-node
          image: selenium/node-chrome:latest
          imagePullPolicy: IfNotPresent
          env:
            - name: SE_OTEL_SERVICE_NAME
              value: "release-name-selenium-chrome-node"
            - name: SE_NODE_PORT
              value: "5555"
            - name: SE_NODE_ENABLE_MANAGED_DOWNLOADS
              value: "true"
          envFrom:
            - configMapRef:
                name: release-name-selenium-event-bus
            - configMapRef:
                name: release-name-selenium-node-config
            - configMapRef:
                name: release-name-selenium-logging-config
            - configMapRef:
                name: release-name-selenium-server-config
            - secretRef:
                name: release-name-selenium-secrets
          ports:
            - containerPort: 5555
              protocol: TCP
          volumeMounts:
            - name: dshm
              mountPath: /dev/shm
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: "1"
              memory: 2Gi
          lifecycle: 
            preStop:
              exec:
                command:
                - bash
                - -c
                - /opt/selenium/nodePreStop.sh >> /proc/1/fd/1
          startupProbe:
            exec:
              command: ["bash", "-c", "/opt/selenium/nodeProbe.sh >> /proc/1/fd/1"]
            failureThreshold: 25
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 125
      terminationGracePeriodSeconds: 30
      volumes:
        - name: release-name-selenium-node-config
          configMap:
            name: release-name-selenium-node-config
            defaultMode: 493
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi
bschreder commented 8 months ago

Ok, I'm able to reproduce with a dummy chart and import selenium-grid as chart dependency. The scripts in selenium-grid/configs are not loaded by default

I see the scripts in the config directory extracted from the .tgz file.

I see line 27+ of node-configmap.yaml that loops over extraScripts, but I'm not seeing the $.Files.Get statement that would match the $.Files.Glob statement if $value is empty. I'm also not seeing the script reference in the output of my config file above.

In the node-deployment file above, I see the volumeMounts and volumes specified. I think this is good.

I hope this helps.

VietND96 commented 8 months ago

I tried to read few docs and issues report to Helm. Looks like .Files.Glob in a chart could not load default files itself when it is imported as a sub-chart Thank you for your use case report. Let me try to see if any workaround can be applied and give a patch soon. In the meantime, please keep global.seleniumGrid.defaultNodeStartupProbe: httpGet in your chart.

VietND96 commented 8 months ago

Chart 0.28.2 is out with the fix for this. RCA: $.Files.Glob in range didn't work properly when the chart is imported as a sub-chart Added template tests for case sub-chart to guard regression changes.

bschreder commented 8 months ago

@VietND96 thanks for the fix.

CameronWard301 commented 8 months ago

I still see the same error when using 0.28.2 and 0.28.3

VietND96 commented 8 months ago

@CameronWard301, may I know your values used? do you override nodeConfigMap with your own? Since we have a template test to confirm it works

kanthasamyraja commented 8 months ago

Hi,

Getting below error when installed 4.18.1-20240224 Version on my Kubernetes Cluster (v1.23.1).

chrome-node, edge-node and firefox-node not turning to ready status. Showing (0/1)

Error Message

  Warning  Unhealthy  16m                 kubelet            Startup probe failed:
  Warning  Unhealthy  119s (x7 over 14m)  kubelet            Startup probe failed: command "bash -c /opt/selenium/nodeProbe.sh >> /proc/1/fd/1" **timed out**

Getting below error When I tried

kubectl exec -it selenium-grid-selenium-chrome-node-559c45f49-lfrfl sh -n tool-selenium-np
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ ls -la /opt/selenium/nodeProbe.sh
-rwxr-xr-x 1 root root 1886 Mar  1 12:41 /opt/selenium/nodeProbe.sh
$ ls -la /proc/1/fd/1
l-wx------ 1 seluser seluser 64 Mar  1 12:41 /proc/1/fd/1 -> 'pipe:[654232714]'
$ bash -c /opt/selenium/nodeProbe.sh >> /proc/1/fd/1
jq: error: Could not open file /tmp/gridProbe23941: No such file or directory
$

Can anyone help. Am I missing anything here?

VietND96 commented 8 months ago

@kanthasamyraja, the idea of script is getting Grid status via SE_NODE_GRID_URL env var set in Node and check the NodeId is registered successfully. What is it value in your deployment? If you exec into pod, can you try a cURL command to see what is the response? curl -sfk "${SE_NODE_GRID_URL}/status"

kanthasamyraja commented 8 months ago

Thanks @VietND96 for your response.

No response. Waiting..

$ hostname -f
selenium-grid-selenium-chrome-node-764df6d85f-ncbrz
$ env | grep SE_NODE_GRID_URL
SE_NODE_GRID_URL=http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1
$ curl -sfk "${SE_NODE_GRID_URL}/status"

Pod status

NAME                                                    READY   STATUS    RESTARTS   AGE
grid-selenium-event-bus-84bbdb7b96-k99gw                1/1     Running   0          3m41s
selenium-grid-selenium-chrome-node-764df6d85f-ncbrz     0/1     Running   0          3m41s
selenium-grid-selenium-distributor-5ff8fb8dd9-ffcv4     1/1     Running   0          3m41s
selenium-grid-selenium-edge-node-76c4596df6-vmpmn       0/1     Running   0          3m41s
selenium-grid-selenium-firefox-node-7cb4bcf79d-cr9df    0/1     Running   0          3m41s
selenium-grid-selenium-router-77bcf5bbb8-dmzc5          1/1     Running   0          3m41s
selenium-grid-selenium-session-map-5dfd6b6ff9-9s6jr     1/1     Running   0          3m41s
selenium-grid-selenium-session-queue-7fc94584f6-nbwnj   1/1     Running   0          3m41s

Logs

kubectl logs selenium-grid-selenium-router-77bcf5bbb8-dmzc5 -n tool-selenium-np1
2024-03-01 14:05:20,962 INFO Included extra file "/etc/supervisor/conf.d/selenium-grid-router.conf" during parsing
2024-03-01 14:05:20,964 INFO RPC interface 'supervisor' initialized
2024-03-01 14:05:20,964 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-01 14:05:20,965 INFO supervisord started with pid 7
2024-03-01 14:05:21,967 INFO spawned: 'selenium-grid-router' with pid 8
Starting Selenium Grid Router...
2024-03-01 14:05:21,972 INFO success: selenium-grid-router entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Using SE_ROUTER_HOST: selenium-grid-selenium-router.tool-selenium-np1
Using SE_ROUTER_PORT: 4444
Appending Selenium options: --log-level INFO
Tracing is disabled
14:05:22.301 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
14:05:22.306 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
14:05:22.885 INFO [RouterServer.createHandlers] - Requiring authentication to connect
14:05:23.067 INFO [RouterServer.execute] - Started Selenium Router 4.18.1 (revision b1d3319b48): http://selenium-grid-selenium-router.tool-selenium-np1:4444
CameronWard301 commented 8 months ago

@CameronWard301, may I know your values used? do you override nodeConfigMap with your own? Since we have a template test to confirm it works

No I didn't override any values or config maps, I simply added it as a repository to my chart.yml and did helm upgrade

VietND96 commented 8 months ago

@kanthasamyraja, SE_NODE_GRID_URL in the node is using the default rendered by chart, right? If so let me check how it is missing port 4444 in the URL. Since from router logs, Started Selenium Router 4.18.1 (revision b1d3319b48): http://selenium-grid-selenium-router.tool-selenium-np1:4444 but the value in Node is only http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1 Can you also try the command curl -sfk http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1:4444 to see it can respond?

VietND96 commented 8 months ago

@CameronWard301, yes, so you are facing the same issue discussed above? The startup probe not return the correct value caused the pod ready 0/1 ?

CameronWard301 commented 8 months ago

@VietND96 Yes same issue with the startup probe

AlexanderRousseeuw commented 8 months ago

I had the same issue on 0.28.3, I fixed it by setting the defaultNodeStartupProbe like @VietND96 suggested, but the syntax was quite confusing to me as I'm still learning how to use external charts. This is what I had to do:

selenium-grid:
  global:
    seleniumGrid:
      defaultNodeStartupProbe: httpGet
kanthasamyraja commented 8 months ago

Hi @VietND96

SE_NODE_GRID_URL in the node is using the default rendered by chart, right?

Used default installation.

Command Output

$ hostname -f
selenium-grid-selenium-chrome-node-764df6d85f-ncbrz
$ curl -sfk http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1:4444
$ curl -v http://admin:admin@selenium-grid-selenium-router.tool-selenium-np1:4444
*   Trying 10.99.2.140:4444...
* Connected to selenium-grid-selenium-router.tool-selenium-np1 (10.99.2.140) port 4444 (#0)
* Server auth using Basic with user 'admin'
> GET / HTTP/1.1
> Host: selenium-grid-selenium-router.tool-selenium-np1:4444
> Authorization: Basic YWRtaW46YWRtaW4=
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< content-length: 0
< Location: /ui
<
* Connection #0 to host selenium-grid-selenium-router.tool-selenium-np1 left intact
$
VietND96 commented 8 months ago

@kanthasamyraja, thank you for your input, I am investigating this case and give a possible fix soon

VietND96 commented 8 months ago

@kanthasamyraja, and all. If you are having a sandbox env for trial, can you check the nightly chart to verify it works before bumping a new version helm install selenium-grid docker-selenium/selenium-grid --version 1.0.0-nightly

Updated: chart 0.28.4 is out

kanthasamyraja commented 8 months ago

Hi @VietND96

Working.

selenium-grid-selenium-chrome-node-5c4c64d498-6s2tb   1/1     Running   0               4m36s
selenium-grid-selenium-edge-node-776bdd8f59-69gzt     1/1     Running   1 (2m56s ago)   4m36s
selenium-grid-selenium-firefox-node-7ff98f797-99bxv   1/1     Running   0               4m36s
selenium-grid-selenium-hub-546fc7f864-984dj           1/1     Running   0               4m36s

k describe po selenium-grid-selenium-edge-node-776bdd8f59-69gzt

Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  4m27s                   default-scheduler  Successfully assigned default/selenium-grid-selenium-edge-node-776bdd8f59-69gzt to zdesk-devops14
  Normal   Pulling    4m26s                   kubelet            Pulling image "selenium/node-edge:nightly"
  Normal   Pulled     3m49s                   kubelet            Successfully pulled image "selenium/node-edge:nightly" in 36.982069453s
  Warning  Unhealthy  3m35s                   kubelet            Startup probe failed: jq: error (at /tmp/gridProbe20868:1): Cannot iterate over null (null)
  Normal   Killing    2m52s                   kubelet            Container selenium-grid-selenium-edge-node failed startup probe, will be restarted
  Normal   Pulled     2m47s                   kubelet            Container image "selenium/node-edge:nightly" already present on machine
  Normal   Created    2m46s (x2 over 3m48s)   kubelet            Created container selenium-grid-selenium-edge-node
  Normal   Started    2m46s (x2 over 3m48s)   kubelet            Started container selenium-grid-selenium-edge-node
  Warning  Unhealthy  2m42s (x12 over 3m47s)  kubelet            Startup probe failed:
github-actions[bot] commented 7 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.