SeleniumHQ / docker-selenium

Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Docker, making it easier to perform browser automation
http://www.selenium.dev/docker-selenium/
Other
7.9k stars 2.51k forks source link

[🐛 Bug]: Startup probe failed: bash: line 1: /opt/selenium/nodeProbe.sh: No such file or directory #2207

Closed AndreasPetersen closed 4 months ago

AndreasPetersen commented 5 months ago

What happened?

Chrome node fails to start due to:

Startup probe failed: bash: line 1: /opt/selenium/nodeProbe.sh: No such file or directory

When I check the opt/selenium directory on the node before Kubernetes kills it, I can see that indeed there is no nodeProbe.sh.

I can see that this was marked as fixed in https://github.com/SeleniumHQ/docker-selenium/issues/2141, but I'm still getting this with 0.29.1.

Command used to start Selenium Grid with Docker (or Kubernetes)

I'm installing the Selenium Grid Helm Chart version 0.29.1 with the following values-file:

global:
  seleniumGrid:
    # Image registry for all selenium components
    imageRegistry: my-corp-proxy/selenium
    # Image tag for all selenium components
    imageTag: 4.18.1-20240224
    # Image tag for browser's nodes
    nodesImageTag: 4.18.1-20240224
    # Image tag for browser's video recorder
    videoImageTag: ffmpeg-6.1-20240224
    # Pull secret for all components, can be overridden individually
    imagePullSecret: ""
    # Log level for all components. Possible values describe here: https://www.selenium.dev/documentation/grid/configuration/cli_options/#logging
    logLevel: INFO

tls:
  enabled: false
  ingress:
    generateTLS: false
    defaultName: "SeleniumHQ"
    defaultDays: 3650
    defaultCN: "www.selenium.dev"
    # or *.domain.com
    defaultSANList: []
    #  - domain.com
    #  - production.domain.com
    defaultIPList: []
    #  - 10.10.10.10
  defaultFile:
    certificate: "certs/selenium.pem"
    privateKey: "certs/selenium.pkcs8.base64"
    trustStore: "certs/selenium.jks"
  certificate:
  privateKey:
  trustStore:
  trustStorePassword: "changeit"
  registrationSecret:
    enabled: false
    value: "HappyTesting"

# Basic auth settings for Selenium Grid
basicAuth:
  # Enable or disable basic auth
  enabled: false
  # Username for basic auth
  username: admin
  # Password for basic auth
  password: admin

# Deploy Router, Distributor, EventBus, SessionMap and Nodes separately
isolateComponents: false

# Service Account for all components
serviceAccount:
  create: true
  # nameOverride:
  annotations: {}
  #  eks.amazonaws.com/role-arn: "arn:aws:iam::12345678:role/video-bucket-permissions"

# ConfigMap that contains SE_EVENT_BUS_HOST, SE_EVENT_BUS_PUBLISH_PORT and SE_EVENT_BUS_SUBSCRIBE_PORT variables
busConfigMap:
  # Name of the configmap
  # nameOverride:
  # Custom annotations for configmap
  annotations: {}

# ConfigMap that contains common environment variables for browser nodes
nodeConfigMap:
  # nameOverride:
  # Default mode for ConfigMap is mounted as file
  defaultMode: 0755
  # Directory where the extra scripts are imported to ConfigMap by default (if given a relative path, it should be in chart's directory)
  extraScriptsImportFrom: "configs/node/**"
  # Directory where the extra scripts are mounted to
  extraScriptsDirectory: "/opt/selenium"
  extraScripts:
    nodePreStop.sh:
    nodeProbe.sh:
  # Name of volume mount is used to mount scripts in the ConfigMap
  scriptVolumeMountName:
  # Custom annotations for configmap
  annotations: {}

recorderConfigMap:
  # nameOverride:
  # Default mode for ConfigMap is mounted as file
  defaultMode: 0755
  # Directory where the extra scripts are imported to ConfigMap by default (if given a relative path, it should be in chart's directory)
  extraScriptsImportFrom: "configs/recorder/**"
  # Directory where the extra scripts are mounted to
  extraScriptsDirectory: "/opt/bin"
  # List of extra scripts to be mounted to the container. Format as `filename: content`
  extraScripts:
  #  video.sh:
  #  video_graphQLQuery.sh:
  # Name of volume mount is used to mount scripts in the ConfigMap
  scriptVolumeMountName:
  videoVolumeMountName: videos
  # Custom annotations for configmap
  annotations: {}

uploaderConfigMap:
  # nameOverride:
  # Default mode for ConfigMap is mounted as file
  defaultMode: 0755
  # Directory where the extra scripts are imported to ConfigMap by default (if given a relative path, it should be in chart's directory)
  extraScriptsImportFrom: "configs/uploader/**"
  # Directory where the extra scripts are mounted to
  extraScriptsDirectory: "/opt/bin"
  # List of extra scripts to be mounted to the container. Format as `filename: content`
  extraScripts:
    upload.sh:
  # Extra files stored in Secret to be mounted to the container.
  secretFiles:
    upload.conf: "[sample]"
  # Name of volume mount is used to mount scripts in the ConfigMap
  scriptVolumeMountName:
  # Name of Secret is used to store the `secretFiles`
  secretVolumeMountName:
  # Custom annotations for configmap
  annotations: {}

# Secrets for all components. Component environment variables contain sensitive data should be stored in secrets.
secrets:
  create: true
  # nameOverride:
  env:
    SE_VNC_PASSWORD: "secret"
  annotations: {}

# Configuration for selenium hub deployment (applied only if `isolateComponents: false`)
hub:
  # imageRegistry: selenium
  # Selenium Hub image name
  imageName: hub
  # Selenium Hub image tag (this overwrites global.seleniumGrid.imageTag parameter)
  # imageTag: 4.18.1-20240224
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""

  # Custom environment variables for selenium-hub
  extraEnvironmentVariables:
    - name: SE_SESSION_REQUEST_TIMEOUT
      value: "1800"

tracing:
  enabled: false
  enabledWithExistingEndpoint: true
  exporter: otlp
  exporterEndpoint: "http://open-telemetry-collector:4317"
  globalAutoConfigure: true
  ingress:
    enabled: true
    annotations:
    paths:
      - backend:
          service:
            name: "{{ .Release.Name }}-jaeger-query"
            port:
              number: 16686
        path: &jaegerBasePath "/jaeger"
        pathType: Prefix

monitoring:
  enabled: false

# Keda scaled object configuration
autoscaling:
  # Enable autoscaling. Implies installing KEDA
  enabled: false

# Configuration for chrome nodes
chromeNode:
  # Enable chrome nodes
  enabled: true

  resources:
    requests:
      memory: "4Gi"
      cpu: "800m"
    limits:
      memory: "4Gi"
      cpu: "1"

  # NOTE: Only used when autoscaling.enabled is false
  # Enable creation of Deployment
  # true (default) - if you want long-living pods
  # false - for provisioning your own custom type such as Jobs
  deploymentEnabled: true

  # Controlled by Selenium Grid Scaler
  replicas: 1
  imageRegistry: my-corp-proxy
  # Image of chrome nodes.
  # We extend the node-chrome with our company certificate
  imageName: node-chrome-my-corp-cert
  # Image of chrome nodes (this overwrites global.seleniumGrid.nodesImageTag)
  imageTag: 4.18.1-20240224
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""
  extraEnvironmentVariables:
    - name: SE_NODE_SESSION_TIMEOUT
      value: "600"
    - name: SCREEN_WIDTH
      value: "1360"
    - name: SCREEN_HEIGHT
      value: "1080"
    - name: http_proxy
      value: "http://my-corp-proxy:8080"
    - name: HTTP_PROXY
      value: "$(http_proxy)"
    - name: https_proxy
      value: "$(http_proxy)"
    - name: HTTPS_PROXY
      value: "$(http_proxy)"
    - name: no_proxy
      value: "localhost"
    - name: NO_PROXY
      value: "$(no_proxy)"
    - name: LANG
      value: da_DK
    - name: LANGUAGE
      value: da_DK
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
  dshmVolumeSizeLimit: 4Gi

# Configuration for firefox nodes
firefoxNode:
  # Enable firefox nodes
  enabled: false

# Configuration for edge nodes
edgeNode:
  # Enable edge nodes
  enabled: false

videoRecorder:
  enabled: false
  # imageRegistry: selenium
  # Image of video recorder
  imageName: video
  # Image of video recorder
  # imageTag: ffmpeg-6.1-20240224
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  targetFolder: "/videos"
  uploader:
    enabled: false
    # Where to upload the video file e.g. remoteName://bucketName/path. Refer to destination syntax of rclone https://rclone.org/docs/
    destinationPrefix:
    # What uploader to use. See .videRecorder.rclone for how to create a new one.
    name:
    configFileName: upload.conf
    entryPointFileName: upload.sh
    # For environment variables used in uploader which contains sensitive information, store in secret and refer envFrom
    # Set config for rclone via ENV var with format: RCLONE_CONFIG_ + name of remote + _ + name of config file option (make it all uppercase)
    secrets:
    #  RCLONE_CONFIG_S3_TYPE: "s3"
    #  RCLONE_CONFIG_S3_PROVIDER: "AWS"
    #  RCLONE_CONFIG_S3_ENV_AUTH: "true"
    #  RCLONE_CONFIG_S3_REGION: "ap-southeast-1"
    #  RCLONE_CONFIG_S3_LOCATION_CONSTRAINT: "ap-southeast-1"
    #  RCLONE_CONFIG_S3_ACL: "private"
    #  RCLONE_CONFIG_S3_ACCESS_KEY_ID: "xxx"
    #  RCLONE_CONFIG_S3_SECRET_ACCESS_KEY: "xxx"
    #  RCLONE_CONFIG_S3_NO_CHECK_BUCKET: "true"
    #  RCLONE_CONFIG_GS_TYPE: "s3"
    #  RCLONE_CONFIG_GS_PROVIDER: "GCS"
    #  RCLONE_CONFIG_GS_ENV_AUTH: "true"
    #  RCLONE_CONFIG_GS_REGION: "asia-southeast1"
    #  RCLONE_CONFIG_GS_LOCATION_CONSTRAINT: "asia-southeast1"
    #  RCLONE_CONFIG_GS_ACL: "private"
    #  RCLONE_CONFIG_GS_ACCESS_KEY_ID: "xxx"
    #  RCLONE_CONFIG_GS_SECRET_ACCESS_KEY: "xxx"
    #  RCLONE_CONFIG_GS_ENDPOINT: "https://storage.googleapis.com"
    #  RCLONE_CONFIG_GS_NO_CHECK_BUCKET: "true"
  ports:
    - 9000
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"
  # SecurityContext for recorder container
  securityContext:
  extraEnvironmentVariables:
  # - name: SE_VIDEO_FOLDER
  #   value: /videos
  # Custom environment variables by sourcing entire configMap, Secret, etc. for video recorder.
  extraEnvFrom:
  # - configMapRef:
  #   name: proxy-settings
  # - secretRef:
  #   name: mysecret

I can see the config map with the nodeProbe.sh is not mounted to the pod. Following is a snippet of YAML of the installed node:

spec:
(...)
  containers:
    - resources:
        limits:
          cpu: '1'
          memory: 4Gi
        requests:
          cpu: 800m
          memory: 4Gi
      terminationMessagePath: /dev/termination-log
      lifecycle:
        preStop:
          exec:
            command:
              - bash
              - '-c'
              - '/opt/selenium/nodePreStop.sh '
      name: selenium-grid-selenium-chrome-node
      (...)
      securityContext:
        capabilities:
          drop:
            - ALL
        runAsUser: 1000720000
        runAsNonRoot: true
        allowPrivilegeEscalation: false
      ports:
        - containerPort: 5555
          protocol: TCP
      imagePullPolicy: IfNotPresent
      startupProbe:
        exec:
          command:
            - bash
            - '-c'
            - '/opt/selenium/nodeProbe.sh Startup '
        timeoutSeconds: 60
        periodSeconds: 5
        successThreshold: 1
        failureThreshold: 12
      volumeMounts:
        // Missing mount of selenium-grid-selenium-node-config
        - name: dshm
          mountPath: /dev/shm
        - name: kube-api-access-d9w7p
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePolicy: File
      envFrom:
        - configMapRef:
            name: selenium-grid-selenium-event-bus
        - configMapRef:
            name: selenium-grid-selenium-node-config
        - configMapRef:
            name: selenium-grid-selenium-logging-config
        - configMapRef:
            name: selenium-grid-selenium-server-config
        - secretRef:
            name: selenium-grid-selenium-secrets
      image: >-
        artifactory.tools.bdpnet.dk/wea-docker-release-local/node-chrome-bankdata-cert:4.18.1-20240224
  serviceAccount: selenium-grid-selenium-serviceaccount
  volumes:
    - name: selenium-grid-selenium-node-config
      configMap:
        name: selenium-grid-selenium-node-config
        defaultMode: 493
(...)

Relevant log output

Log of node-chrome:

2024-04-15 10:43:50,877 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2024-04-15 10:43:50,880 INFO RPC interface 'supervisor' initialized
2024-04-15 10:43:50,880 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-04-15 10:43:50,880 INFO supervisord started with pid 8
2024-04-15 10:43:51,882 INFO spawned: 'xvfb' with pid 9
2024-04-15 10:43:51,884 INFO spawned: 'vnc' with pid 10
2024-04-15 10:43:51,886 INFO spawned: 'novnc' with pid 11
2024-04-15 10:43:51,888 INFO spawned: 'selenium-node' with pid 12
2024-04-15 10:43:51,892 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
E: [pulseaudio] main.c: Daemon startup failed.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Appending Selenium options: --session-timeout 600
Appending Selenium options: --register-period 60
Appending Selenium options: --register-cycle 5
Appending Selenium options: --heartbeat-period 30
Appending Selenium options: --log-level INFO
Generating Selenium Config
Setting up SE_NODE_HOST...
Tracing is enabled
Classpath will be enriched with these external jars :  --ext /external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-otlp/1.34.1/opentelemetry-exporter-otlp-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/grpc/grpc-netty/1.61.0/grpc-netty-1.61.0.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-trace/1.34.1/opentelemetry-sdk-trace-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-metrics/1.34.1/opentelemetry-sdk-metrics-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-logs/1.34.1/opentelemetry-sdk-logs-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-otlp-common/1.34.1/opentelemetry-exporter-otlp-common-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-sender-okhttp/1.34.1/opentelemetry-exporter-sender-okhttp-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/o...
List arguments for OpenTelemetry:  -Dotel.resource.attributes=service.name=selenium-grid-selenium-chrome-node -Dotel.traces.exporter=otlp -Dotel.exporter.otlp.endpoint=http://open-telemetry-collector:4317 -Dotel.java.global-autoconfigure.enabled=true
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-hub.staging:4442"
subscribe = "tcp://selenium-grid-selenium-hub.staging:4443"
[server]
port = "5555"
[node]
grid-url = "http://selenium-grid-selenium-hub.staging"
session-timeout = "600"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 0
max-sessions = 1
[[node.driver-configuration]]

OpenShift/Kubernetes reports:

Startup probe failed: bash: line 1: /opt/selenium/nodeProbe.sh: No such file or directory

and

PreStopHook failed


### Operating System

OpenShift with Kubernetes v1.26.13+8f85140

### Docker Selenium version (image tag)

4.18.1-20240224

### Selenium Grid chart version (chart version)

0.29.1
github-actions[bot] commented 5 months ago

@AndreasPetersen, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

AndreasPetersen commented 5 months ago

If I edit the chrome-node deployment on OpenShift directly and add the nodeConfigMap.extraScripts:

          volumeMounts:
            - name: dshm
              mountPath: /dev/shm
            - name: selenium-grid-selenium-node-config
              mountPath: /opt/selenium/nodePreStop.sh
              subPath: nodePreStop.sh
            - name: selenium-grid-selenium-node-config
              mountPath: /opt/selenium/nodeProbe.sh
              subPath: nodeProbe.sh

The files are now available. However, the pod still fails to start:

2024-04-15 11:41:39,056 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2024-04-15 11:41:39,059 INFO RPC interface 'supervisor' initialized
2024-04-15 11:41:39,059 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-04-15 11:41:39,059 INFO supervisord started with pid 8
2024-04-15 11:41:40,062 INFO spawned: 'xvfb' with pid 21
2024-04-15 11:41:40,064 INFO spawned: 'vnc' with pid 22
2024-04-15 11:41:40,066 INFO spawned: 'novnc' with pid 23
2024-04-15 11:41:40,068 INFO spawned: 'selenium-node' with pid 24
2024-04-15 11:41:40,073 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
E: [pulseaudio] main.c: Daemon startup failed.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Appending Selenium options: --session-timeout 600
Appending Selenium options: --register-period 60
Appending Selenium options: --register-cycle 5
Appending Selenium options: --heartbeat-period 30
Appending Selenium options: --log-level INFO
Generating Selenium Config
Setting up SE_NODE_HOST...
Tracing is enabled
Classpath will be enriched with these external jars :  --ext /external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-otlp/1.34.1/opentelemetry-exporter-otlp-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/grpc/grpc-netty/1.61.0/grpc-netty-1.61.0.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-trace/1.34.1/opentelemetry-sdk-trace-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-metrics/1.34.1/opentelemetry-sdk-metrics-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-sdk-logs/1.34.1/opentelemetry-sdk-logs-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-otlp-common/1.34.1/opentelemetry-exporter-otlp-common-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/opentelemetry/opentelemetry-exporter-sender-okhttp/1.34.1/opentelemetry-exporter-sender-okhttp-1.34.1.jar:/external_jars/https/repo1.maven.org/maven2/io/o...
List arguments for OpenTelemetry:  -Dotel.resource.attributes=service.name=selenium-grid-selenium-chrome-node -Dotel.traces.exporter=otlp -Dotel.exporter.otlp.endpoint=http://open-telemetry-collector:4317 -Dotel.java.global-autoconfigure.enabled=true
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-hub.staging:4442"
subscribe = "tcp://selenium-grid-selenium-hub.staging:4443"
[server]
port = "5555"
[node]
grid-url = "http://selenium-grid-selenium-hub.staging"
session-timeout = "600"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 0
max-sessions = 1
[[node.driver-configuration]]

Kubernetes reports:

Startup probe failed: 2024-04-15T11:39:09UTC [Probe.Startup] - Wait for the Node to report its status
VietND96 commented 5 months ago

Hello, I see you are using chart 0.29.1 but you are passing some old values via your own override YAML

https://github.com/SeleniumHQ/docker-selenium/blob/ab3f8b8546f30da7ae88a308f63bc014718b6355/charts/selenium-grid/values.yaml#L115-L117

Default value of these 2 key config is "" But I saw your values.yaml shared

  extraScripts:
    nodePreStop.sh:
    nodeProbe.sh:

Can you update your input YAML and retry? Can you also try the best practice is override YAML only contains config keys have value changed, don't clone the whole values.yaml from previous chart version

VietND96 commented 5 months ago

Okay, via list extraEnvironmentVariables of chrome node, there is something relates to proxy

    - name: http_proxy
      value: "http://my-corp-proxy:8080"
    - name: HTTP_PROXY
      value: "$(http_proxy)"
    - name: https_proxy
      value: "$(http_proxy)"
    - name: HTTPS_PROXY
      value: "$(http_proxy)"
    - name: no_proxy
      value: "localhost"
    - name: NO_PROXY
      value: "$(no_proxy)"

I guess the image is rebuilt with some extra functional supports. So, the default script implemented for startup probe could not work properly behind the proxy in node. So you can change value to httpGet method in global.seleniumGrid.defaultNodeStartupProbe Or modify the nodeProbe.sh in chart dir configs/node Or tell me your scenario with those proxy set, then I update the default script to support better against different cases

VietND96 commented 5 months ago

Ok, I think cURL command in nodeProbe.sh needs to deal with the proxy is set. I will check and update

AndreasPetersen commented 5 months ago

Hi @VietND96 . Thanks for the quick reply!

I think there are two issues:

  1. The nodeProbe.sh and nodePreStop.sh are not mounted to the chrome-node using the default configuration, despite the fact that the default configuration expects these files to be present.

  2. Once I add the nodeProbe.sh, the script fails, possibly due to proxy as you mention.

We build an image with our corporate certificate using this Dockerfile:

FROM selenium/node-chrome:4.18.1-20240224

USER root

# Makes apt-get fetch through our corporate Artifactory
COPY ./sources.list /etc/apt

# Add certificate, install NSS Shared Database tools
COPY ./corp.crt /usr/local/share/ca-certificates
RUN update-ca-certificates && \
    apt-get update && \
    apt-get -y install libnss3-tools

user seluser
## Create NSS Shared Database that Chrome uses for certificates to trust
## and add the certificate
#RUN mkdir -p $HOME/.pki/nssdb && \
#    certutil -d sql:$HOME/.pki/nssdb -N --empty-password && \
RUN certutil -d sql:$HOME/.pki/nssdb -A -t "CPTu,CPTu,CPTu" -n my-corp -i /usr/local/share/ca-certificates/corp.crt

user root

# Give permissions to nssdb so that Chrome can actually use it
RUN chmod -R 777 /home/seluser/.pki/nssdb

user seluser

We then also set the proxy environment variables as you stated. We need to have this proxy set, since some of the websites we test on can only be accessed through our corporate proxy. Others can only be accessed in our internal network, and these should not be requested through the proxy. There are a number of hosts that I excluded from the NO_PROXY env var in my example, apart from localhost.

Should I create a seperate issue for the second issue?

VietND96 commented 5 months ago

I have updated the cURL command in probes scripts to ignore proxy. On the issue probe scripts didn't mount correctly. Did you update your override YAML following this?

  extraScripts:
    nodePreStop.sh: ""
    nodeProbe.sh: ""

If it doesn't work, please share the Helm command that you used

AndreasPetersen commented 5 months ago

I install the chart using:

helm upgrade selenium-grid selenium-grid --install --repo my-corp-proxy-repo --version 0.29.1 --namespace staging -f values.yaml --wait

With the values.yaml being what I wrote in my original post. I must have copied an older version of the values.yaml of the helm chart, because I can see that in it that extraScripts is missing the empty quoutes:

extraScripts:
  nodePreStop.sh:
  nodeProbe.sh:

I updated my values.yaml to:

global:
  seleniumGrid:
    # Image registry for all selenium components
    imageRegistry: my-corp-proxy-repo/selenium
    # Image tag for all selenium components
    imageTag: 4.19.1-20240402
    # Image tag for browser's nodes
    nodesImageTag: 4.19.1-20240402
    # Image tag for browser's video recorder
    videoImageTag: ffmpeg-6.1-20240224
    # Pull secret for all components, can be overridden individually
    imagePullSecret: ""
    # Log level for all components. Possible values describe here: https://www.selenium.dev/documentation/grid/configuration/cli_options/#logging
    logLevel: INFO

# Basic auth settings for Selenium Grid
basicAuth:
  # Enable or disable basic auth
  enabled: false
  # Username for basic auth
  username: admin
  # Password for basic auth
  password: admin

# Deploy Router, Distributor, EventBus, SessionMap and Nodes separately
isolateComponents: false

# Service Account for all components
serviceAccount:
  create: true
  # nameOverride:
  annotations: {}
  #  eks.amazonaws.com/role-arn: "arn:aws:iam::12345678:role/video-bucket-permissions"

# Configuration for selenium hub deployment (applied only if `isolateComponents: false`)
hub:
  # imageRegistry: selenium
  # Selenium Hub image name
  imageName: hub
  # Selenium Hub image tag (this overwrites global.seleniumGrid.imageTag parameter)
  # imageTag: 4.18.1-20240224
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""

  # Custom environment variables for selenium-hub
  extraEnvironmentVariables:
    - name: SE_SESSION_REQUEST_TIMEOUT
      value: "1800"

tracing:
  enabled: false
  enabledWithExistingEndpoint: true
  exporter: otlp
  exporterEndpoint: "http://open-telemetry-collector:4317"
  globalAutoConfigure: true
  ingress:
    enabled: true
    annotations:
    paths:
      - backend:
          service:
            name: "{{ .Release.Name }}-jaeger-query"
            port:
              number: 16686
        path: &jaegerBasePath "/jaeger"
        pathType: Prefix

monitoring:
  enabled: false

# Keda scaled object configuration
autoscaling:
  # Enable autoscaling. Implies installing KEDA
  enabled: false

# Configuration for chrome nodes
chromeNode:
  # Enable chrome nodes
  enabled: true

  resources:
    requests:
      memory: "4Gi"
      cpu: "800m"
    limits:
      memory: "4Gi"
      cpu: "1"

  # NOTE: Only used when autoscaling.enabled is false
  # Enable creation of Deployment
  # true (default) - if you want long-living pods
  # false - for provisioning your own custom type such as Jobs
  deploymentEnabled: true

  replicas: 1
  imageRegistry: my-corp-proxy-repo
  # Image of chrome nodes
  imageName: node-chrome-bankdata-cert
  # Image of chrome nodes (this overwrites global.seleniumGrid.nodesImageTag)
  imageTag: 4.19.1-20240402
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  extraEnvironmentVariables:
    - name: SE_NODE_SESSION_TIMEOUT
      value: "600"
    - name: SCREEN_WIDTH
      value: "1360"
    - name: SCREEN_HEIGHT
      value: "1080"
    - name: http_proxy
      value: "http://my-corp-proxy:8080"
    - name: HTTP_PROXY
      value: "$(http_proxy)"
    - name: https_proxy
      value: "$(http_proxy)"
    - name: HTTPS_PROXY
      value: "$(http_proxy)"
    - name: no_proxy
      value: "localhost,my-corp.dk"
    - name: NO_PROXY
      value: "$(no_proxy)"
    - name: LANG
      value: da_DK
    - name: LANGUAGE
      value: da_DK
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
  dshmVolumeSizeLimit: 4Gi

# Configuration for firefox nodes
firefoxNode:
  # Enable firefox nodes
  enabled: false

# Configuration for edge nodes
edgeNode:
  # Enable edge nodes
  enabled: false

The extra scripts are now present. It of course still fails because of the proxy issue. Looking forward to the proxy fix. Thanks for the quick help @VietND96 !

AndreasPetersen commented 5 months ago

Until a fix is released, I've worked around the issue by adding to my no_proxy list 127.0.0.1,$SE_HUB_HOST

VietND96 commented 5 months ago

I will bump chart ver 0.29.2 soon today, since one more issue needs to be confirmed also.

VietND96 commented 4 months ago

I could not do the patch 0.29.2 as planned. Chart 0.30.0 will be released together with based Selenium 4.20.0

github-actions[bot] commented 3 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.