SeleniumHQ / docker-selenium

Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Docker, making it easier to perform browser automation
http://www.selenium.dev/docker-selenium/
Other
8k stars 2.51k forks source link

[🐛 Bug]: Tests are just getting queued up though there are available sessions. #2169

Open tv2022 opened 8 months ago

tv2022 commented 8 months ago

What happened?

Suddenly when we run some automation jobs the grid hub queue size is growing and not allocating sessions though we have enough sessions. We had to restart the grid to overcome this issue. This happens very frequently. We are using podman for configuring our grid. This seems relevant to this issue https://github.com/SeleniumHQ/docker-selenium/issues/1865. Tests will run for some time, and suddenly this issue is popping up.

image

Capabilities:


browserName: 'chrome',
    acceptInsecureCerts: true,
    'goog:chromeOptions': {
      prefs: {
        intl: {accept_languages: "en-US"},
      },
      w3c: true,
      args: [
        'incognito',
        '--disable-browser-side-navigation',
        '--ignore-certificate-errors',
        '--no-sandbox', '--start-fullscreen',
        '--disable-infobars',
        '--disable-dev-shm-usage'
      ]
    }

Command used to start Selenium Grid with Docker (or Kubernetes)

podman run -d --name selenium-hub \
 --network myprivate \
 --cpus 2 \
 -p 4442:4442 \
 -p 4443:4443 \
 -p 4444:4444 \
 --restart always \
 --memory 20000M \
 --memory-reservation 19000M \
 -e GRID_BROWSER_TIMEOUT=300 \
 -e GRID_TIMEOUT=300 \
 -e GRID_MAX_SESSION=60 \
 -e GRID_MAX_INSTANCES=3 \
 -e GRID_CLEAN_UP_CYCLE=5000 \
 -e GRID_UNREGISTER_IF_STILL_DOWN_AFTER=180000 \
 -e GRID_NEW_SESSION_WAIT_TIMEOUT=60 \
 docker.io/selenium/hub:4.14.1-20231025

# Set the number of nodes
NUM_NODES=150

NETWORK_NAME=private

MEMORY_LIMIT=4000M
MEMORY_RESERVATION=3900M

for ((i=1; i<=$NUM_NODES; i++)); do
 NODE_NAME="selenium-chrome-node-$i"

 podman run -d --name $NODE_NAME \
  --network $NETWORK_NAME \
  --cpus 1.5 \

Relevant log output

NA

Operating System

Red Hat Enterprise Linux release 8.9 (Ootpa)

Docker Selenium version (image tag)

4.14.1-20231025

Selenium Grid chart version (chart version)

NA

github-actions[bot] commented 8 months ago

@tv2022, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

VietND96 commented 8 months ago

Hi @tv2022, in your tests, is there any setCapability cause the mismatch? Or is there options that enable remote download? If yes, Node should set env var SE_NODE_ENABLE_MANAGED_DOWNLOADS=true

VietND96 commented 8 months ago

Ok, I missed this statement We had to restart the grid to overcome this issue. Can you enable the debug logs in Hub by setting SE_LOG_LEVEL=FINE. Once the issue happens again, before restarting the Hub, capture the Hub container logs. With logs, it will help us can understand what is the issue.

VietND96 commented 8 months ago

Also, I saw in the steps, docker.io/selenium/hub:4.14.1-20231025 image was used. Can you also have a try with the recent versions of Hub to see if the issue is reproducible?

tv2022 commented 8 months ago

@VietND96 Thanks for the quick response, don't have any download option enabled in capabilities but I still tried adding it and after few hours I saw this issue again. I also changed GRID_MAX_SESSION=60 to GRID_MAX_SESSION=150 as we may trigger multiple jobs at any point, and the concurrent session may go up beyond 60 but still I was able to reproduce the issue.

Sure I will add this SE_LOG_LEVEL=FINE and capture logs with the latest image. Will get back once I am able to reproduce the issue again with this env variable.

Thanks!

tv2022 commented 8 months ago

@VietND96 Thanks for the quick response, don't have any download option enabled in capabilities but I still tried adding it and after few hours I saw this issue again. I also changed GRID_MAX_SESSION=60 to GRID_MAX_SESSION=150 as we may trigger multiple jobs at any point, and the concurrent session may go up beyond 60 but still I was able to reproduce the issue.

Sure I will add this SE_LOG_LEVEL=FINE and capture logs with the latest image. Will get back once I am able to reproduce the issue again with this env variable.

Thanks!

@VietND96 I can see the issue again with the latest version 4.18.1-20240224 and SE_LOG_LEVEL=FINE. The log file is huge (8GB) so cannot upload the entire file, I am still trying to analyze the log. Do you have any specific inputs or exception to look for related to this issue?

I see lot of below exception messages in log.

18:25:14.088 DEBUG [ConnectionPool.cleanup] - [HttpClient-445-SelectorManager] [6219s 989ms] ConnectionPool(445) SocketTube(764) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.091 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) got read error: java.io.IOException: connection closed locally
18:25:14.091 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(759))
18:25:14.092 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.092 DEBUG [ConnectionPool.cleanup] - [HttpClient-445-SelectorManager] [6219s 992ms] ConnectionPool(445) SocketTube(759) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-466-SelectorManager] [6220s 422ms] SocketTube(812) got read error: java.io.IOException: connection closed locally
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-468-SelectorManager] [6220s 422ms] SocketTube(815) got read error: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) got read error: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(805))
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-468-SelectorManager] [6220s 422ms] SocketTube(815) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(815))
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-468-SelectorManager] [6220s 424ms] SocketTube(815) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-466-SelectorManager] [6220s 424ms] SocketTube(812) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(812))
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-466-SelectorManager] [6220s 424ms] SocketTube(812) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.525 DEBUG [ConnectionPool.cleanup] - [HttpClient-466-SelectorManager] [6220s 426ms] ConnectionPool(466) SocketTube(812) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.523 DEBUG [ConnectionPool.cleanup] - [HttpClient-468-SelectorManager] [6220s 424ms] ConnectionPool(468) SocketTube(815) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.525 DEBUG [ConnectionPool.cleanup] - [HttpClient-463-SelectorManager] [6220s 426ms] ConnectionPool(463) SocketTube(805) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.525 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-463-SelectorManager] [6220s 426ms] SocketTube(804) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(814))
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(811))
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [ConnectionPool.cleanup] - [HttpClient-466-SelectorManager] [6220s 428ms] ConnectionPool(466) SocketTube(811) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [ConnectionPool.cleanup] - [HttpClient-468-SelectorManager] [6220s 428ms] ConnectionPool(468) SocketTube(814) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.528 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-463-SelectorManager] [6220s 428ms] SocketTube(804) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(804))
18:25:14.528 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 428ms] SocketTube(804) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.528 DEBUG [ConnectionPool.cleanup] - [HttpClient-463-SelectorManager] [6220s 428ms] ConnectionPool(463) SocketTube(804) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.558 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-441-SelectorManager] [6220s 459ms] SocketTube(751) got read error: java.io.IOException: connection closed locally
18:25:14.621 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-441-SelectorManager] [6220s 521ms] SocketTube(751) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(751))
18:25:14.621 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-441-SelectorManager] [6220s 522ms] SocketTube(751) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.538 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-416-SelectorManager] [6220s 439ms] SocketTube(706) got read error: java.io.IOException: connection closed locally
18:25:14.626 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 527ms] SocketTube(816) forwarding error to subscriber: java.io.IOException: connection closed locally
VietND96 commented 8 months ago

Can you check if there are any logs related to the below string that appeared consecutively in Hub?

selenium.common.exceptions.SessionNotCreatedException Could not start a new session. Could not start a new session. Unable to create new session LocalDistributor.reserveSlot LocalDistributor.newSession Unable to find a free slot for request Ending request (POST) LocalNode.stopTimedOutSession Deleted session from local Session Map

tv2022 commented 8 months ago

@VietND96 No, i don't find any exception starting with either first 2 lines.

selenium.common.exceptions.SessionNotCreatedException
Could not start a new session. Could not start a new session. Unable to create new session

Also, no log was found with this exception LocalNode.stopTimedOutSession

Found the below log related to a few sessions.

18:31:57.053 DEBUG [LocalDistributor.reserveSlot] - No slots found for request 035bc19e-125c-4555-bb32-3202d2aa6a56 and capabilities Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}
18:31:57.056 INFO [LocalDistributor.newSession] - Unable to find a free slot for request 035bc19e-125c-4555-bb32-3202d2aa6a56. 
18:31:57.074 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=04b42b799c8c9faf, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.new_session, kind=INTERNAL, startEpochNanos=1711477917035085033, endEpochNanos=1711477917073164451, attributes=AttributesMap{data={error=true}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[ImmutableEventData{name=Session request received by the Distributor, attributes={logger="org.openqa.selenium.grid.distributor.local.LocalDistributor", request.payload="[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}, epochNanos=1711477917035106807, totalAttributeCount=2}, ImmutableEventData{name=exception, attributes={exception.message="Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56", logger="org.openqa.selenium.grid.distributor.local.LocalDistributor", request.payload="[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}, epochNanos=1711477917073161864, totalAttributeCount=3}], totalRecordedEvents=2, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: ABORTED Description:}, hasEnded=true}
18:31:57.075 WARN [SeleniumSpanExporter$1.lambda$export$1] - Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56
18:31:57.094 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "0052ee5b995e046800c34be87f386338","eventTime": 1711477917073161864,"eventName": "exception","attributes": {"exception.message": "Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56","logger": "org.openqa.selenium.grid.distributor.local.LocalDistributor","request.payload": "[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}}
18:31:57.114 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=b8e96ce838da9ebb, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.retry, kind=INTERNAL, startEpochNanos=1711477917095127649, endEpochNanos=1711477917113470452, attributes={}, totalAttributeCount=0, events=[ImmutableEventData{name=Retry adding to front of queue. No slot available., attributes={logger="org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable", request="SessionRequest[requestId=035bc19e-125c-4555-bb32-3202d2aa6a56, desiredCapabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}], downstreamDialects=[W3C], metadata={desiredCapabilities={browserName=chrome, acceptInsecureCerts=true, goog:chromeOptions={prefs={intl={accept_languages=en-US}}, w3c=true, args=[incognito, --disable-browser-side-navigation, --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage]}}}, traceHeaders={traceparent=00-0052ee5b995e046800c34be87f386338-6ea589eaa1851fd2-01}]", request.id="035bc19e-125c-4555-bb32-3202d2aa6a56", request.retry_add=true}, epochNanos=1711477917113468586, totalAttributeCount=4}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
18:31:57.115 DEBUG [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "0052ee5b995e046800c34be87f386338","eventTime": 1711477917113468586,"eventName": "**Retry adding to front of queue**. No slot available.","attributes": {"logger": "org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable","request": "SessionRequest[requestId=035bc19e-125c-4555-bb32-3202d2aa6a56, desiredCapabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}], downstreamDialects=[W3C], metadata={desiredCapabilities={browserName=chrome, acceptInsecureCerts=true, goog:chromeOptions={prefs={intl={accept_languages=en-US}}, w3c=true, args=[incognito, --disable-browser-side-navigation, --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage]}}}, traceHeaders={traceparent=00-0052ee5b995e046800c34be87f386338-6ea589eaa1851fd2-01}]","request.id": "035bc19e-125c-4555-bb32-3202d2aa6a56","request.retry_add": true}}
18:31:57.115 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=6ea589eaa1851fd2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.poll_queue, kind=INTERNAL, startEpochNanos=1711477917035000000, endEpochNanos=1711477917114692381, attributes=AttributesMap{data={request.id=035bc19e-125c-4555-bb32-3202d2aa6a56}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
18:32:01.320 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=36d1570e9bd5af94, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=6ea589eaa1851fd2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.poll_queue, kind=INTERNAL, startEpochNanos=1711477920290000000, endEpochNanos=1711477921320458446, attributes=AttributesMap{data={request.id=035bc19e-125c-4555-bb32-3202d2aa6a56}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
adam-chaters commented 6 months ago

I don't currently have time to add much detail, but I did want to chime in that I'm seeing the same got read error: java.io.IOException: connection closed locally issues, where the Hub pod is online and seems to be staying that way, but it is struggling to communicate with the nodes as they register.

Edit: Resolved my inter-node errors by ensuring the chart dependencies were rebuilt and redeploying the SE 4.20 images to a clean environment. My pods are still dying off quickly after start, but this appears to be related to the startup probe; turning it off kept my containers running as expected.

VietND96 commented 1 month ago

With the issue, after some time, new requests are not accepted by hub/distributor anymore. If you are deploying Grid in K8s, you can refer to liveness probe settings, it will guard to restart once meet 0 sessions while queue getting up - https://github.com/SeleniumHQ/docker-selenium/tree/trunk/charts/selenium-grid#distributor-probes For docker compose, you can add a health check with the script follow as - https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/configs/distributor/distributorProbe.sh

VietND96 commented 1 month ago

Also, these env vars name looks like not support, mostly it starts with prefix SE_

 -e GRID_BROWSER_TIMEOUT=300 \
 -e GRID_TIMEOUT=300 \
 -e GRID_MAX_SESSION=60 \
 -e GRID_MAX_INSTANCES=3 \
 -e GRID_CLEAN_UP_CYCLE=5000 \
 -e GRID_UNREGISTER_IF_STILL_DOWN_AFTER=180000 \
 -e GRID_NEW_SESSION_WAIT_TIMEOUT=60 \