Open tv2022 opened 8 months ago
@tv2022, thank you for creating this issue. We will troubleshoot it as soon as we can.
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template
label.
If the issue is a question, add the I-question
label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted
label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-*
label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer
label.
Thank you!
Hi @tv2022, in your tests, is there any setCapability
cause the mismatch?
Or is there options that enable remote download? If yes, Node should set env var SE_NODE_ENABLE_MANAGED_DOWNLOADS=true
Ok, I missed this statement We had to restart the grid to overcome this issue
.
Can you enable the debug logs in Hub by setting SE_LOG_LEVEL=FINE
. Once the issue happens again, before restarting the Hub, capture the Hub container logs. With logs, it will help us can understand what is the issue.
Also, I saw in the steps, docker.io/selenium/hub:4.14.1-20231025
image was used. Can you also have a try with the recent versions of Hub to see if the issue is reproducible?
@VietND96 Thanks for the quick response, don't have any download option enabled in capabilities but I still tried adding it and after few hours I saw this issue again. I also changed GRID_MAX_SESSION=60
to GRID_MAX_SESSION=150
as we may trigger multiple jobs at any point, and the concurrent session may go up beyond 60 but still I was able to reproduce the issue.
Sure I will add this SE_LOG_LEVEL=FINE
and capture logs with the latest image. Will get back once I am able to reproduce the issue again with this env variable.
Thanks!
@VietND96 Thanks for the quick response, don't have any download option enabled in capabilities but I still tried adding it and after few hours I saw this issue again. I also changed
GRID_MAX_SESSION=60
toGRID_MAX_SESSION=150
as we may trigger multiple jobs at any point, and the concurrent session may go up beyond 60 but still I was able to reproduce the issue.Sure I will add this
SE_LOG_LEVEL=FINE
and capture logs with the latest image. Will get back once I am able to reproduce the issue again with this env variable.Thanks!
@VietND96 I can see the issue again with the latest version 4.18.1-20240224 and SE_LOG_LEVEL=FINE. The log file is huge (8GB) so cannot upload the entire file, I am still trying to analyze the log. Do you have any specific inputs or exception to look for related to this issue?
I see lot of below exception messages in log.
18:25:14.088 DEBUG [ConnectionPool.cleanup] - [HttpClient-445-SelectorManager] [6219s 989ms] ConnectionPool(445) SocketTube(764) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.091 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) got read error: java.io.IOException: connection closed locally
18:25:14.091 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(759))
18:25:14.092 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-445-SelectorManager] [6219s 992ms] SocketTube(759) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.092 DEBUG [ConnectionPool.cleanup] - [HttpClient-445-SelectorManager] [6219s 992ms] ConnectionPool(445) SocketTube(759) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-466-SelectorManager] [6220s 422ms] SocketTube(812) got read error: java.io.IOException: connection closed locally
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-468-SelectorManager] [6220s 422ms] SocketTube(815) got read error: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) got read error: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(805))
18:25:14.521 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-468-SelectorManager] [6220s 422ms] SocketTube(815) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(815))
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-468-SelectorManager] [6220s 424ms] SocketTube(815) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-466-SelectorManager] [6220s 424ms] SocketTube(812) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(812))
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-466-SelectorManager] [6220s 424ms] SocketTube(812) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.525 DEBUG [ConnectionPool.cleanup] - [HttpClient-466-SelectorManager] [6220s 426ms] ConnectionPool(466) SocketTube(812) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.523 DEBUG [ConnectionPool.cleanup] - [HttpClient-468-SelectorManager] [6220s 424ms] ConnectionPool(468) SocketTube(815) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.523 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 424ms] SocketTube(805) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.525 DEBUG [ConnectionPool.cleanup] - [HttpClient-463-SelectorManager] [6220s 426ms] ConnectionPool(463) SocketTube(805) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.525 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-463-SelectorManager] [6220s 426ms] SocketTube(804) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(814))
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) got read error: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(811))
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-466-SelectorManager] [6220s 428ms] SocketTube(811) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [ConnectionPool.cleanup] - [HttpClient-466-SelectorManager] [6220s 428ms] ConnectionPool(466) SocketTube(811) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.527 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-468-SelectorManager] [6220s 428ms] SocketTube(814) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.527 DEBUG [ConnectionPool.cleanup] - [HttpClient-468-SelectorManager] [6220s 428ms] ConnectionPool(468) SocketTube(814) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.528 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-463-SelectorManager] [6220s 428ms] SocketTube(804) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(804))
18:25:14.528 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 428ms] SocketTube(804) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.528 DEBUG [ConnectionPool.cleanup] - [HttpClient-463-SelectorManager] [6220s 428ms] ConnectionPool(463) SocketTube(804) : ConnectionPool.cleanup(java.io.IOException: connection closed locally)
18:25:14.558 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-441-SelectorManager] [6220s 459ms] SocketTube(751) got read error: java.io.IOException: connection closed locally
18:25:14.621 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.read] - [HttpClient-441-SelectorManager] [6220s 521ms] SocketTube(751) Sending error java.io.IOException: connection closed locally to subscriber CleanupTrigger(SocketTube(751))
18:25:14.621 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-441-SelectorManager] [6220s 522ms] SocketTube(751) forwarding error to subscriber: java.io.IOException: connection closed locally
18:25:14.538 DEBUG [SocketTube$InternalReadPublisher$InternalReadSubscription.signalError] - [HttpClient-416-SelectorManager] [6220s 439ms] SocketTube(706) got read error: java.io.IOException: connection closed locally
18:25:14.626 DEBUG [SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion] - [HttpClient-463-SelectorManager] [6220s 527ms] SocketTube(816) forwarding error to subscriber: java.io.IOException: connection closed locally
Can you check if there are any logs related to the below string that appeared consecutively in Hub?
selenium.common.exceptions.SessionNotCreatedException
Could not start a new session. Could not start a new session. Unable to create new session
LocalDistributor.reserveSlot
LocalDistributor.newSession
Unable to find a free slot for request
Ending request (POST)
LocalNode.stopTimedOutSession
Deleted session from local Session Map
@VietND96 No, i don't find any exception starting with either first 2 lines.
selenium.common.exceptions.SessionNotCreatedException
Could not start a new session. Could not start a new session. Unable to create new session
Also, no log was found with this exception LocalNode.stopTimedOutSession
Found the below log related to a few sessions.
18:31:57.053 DEBUG [LocalDistributor.reserveSlot] - No slots found for request 035bc19e-125c-4555-bb32-3202d2aa6a56 and capabilities Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}
18:31:57.056 INFO [LocalDistributor.newSession] - Unable to find a free slot for request 035bc19e-125c-4555-bb32-3202d2aa6a56.
18:31:57.074 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=04b42b799c8c9faf, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.new_session, kind=INTERNAL, startEpochNanos=1711477917035085033, endEpochNanos=1711477917073164451, attributes=AttributesMap{data={error=true}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[ImmutableEventData{name=Session request received by the Distributor, attributes={logger="org.openqa.selenium.grid.distributor.local.LocalDistributor", request.payload="[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}, epochNanos=1711477917035106807, totalAttributeCount=2}, ImmutableEventData{name=exception, attributes={exception.message="Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56", logger="org.openqa.selenium.grid.distributor.local.LocalDistributor", request.payload="[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}, epochNanos=1711477917073161864, totalAttributeCount=3}], totalRecordedEvents=2, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: ABORTED Description:}, hasEnded=true}
18:31:57.075 WARN [SeleniumSpanExporter$1.lambda$export$1] - Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56
18:31:57.094 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "0052ee5b995e046800c34be87f386338","eventTime": 1711477917073161864,"eventName": "exception","attributes": {"exception.message": "Will retry session 035bc19e-125c-4555-bb32-3202d2aa6a56","logger": "org.openqa.selenium.grid.distributor.local.LocalDistributor","request.payload": "[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}]"}}
18:31:57.114 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=b8e96ce838da9ebb, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.retry, kind=INTERNAL, startEpochNanos=1711477917095127649, endEpochNanos=1711477917113470452, attributes={}, totalAttributeCount=0, events=[ImmutableEventData{name=Retry adding to front of queue. No slot available., attributes={logger="org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable", request="SessionRequest[requestId=035bc19e-125c-4555-bb32-3202d2aa6a56, desiredCapabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}], downstreamDialects=[W3C], metadata={desiredCapabilities={browserName=chrome, acceptInsecureCerts=true, goog:chromeOptions={prefs={intl={accept_languages=en-US}}, w3c=true, args=[incognito, --disable-browser-side-navigation, --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage]}}}, traceHeaders={traceparent=00-0052ee5b995e046800c34be87f386338-6ea589eaa1851fd2-01}]", request.id="035bc19e-125c-4555-bb32-3202d2aa6a56", request.retry_add=true}, epochNanos=1711477917113468586, totalAttributeCount=4}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
18:31:57.115 DEBUG [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "0052ee5b995e046800c34be87f386338","eventTime": 1711477917113468586,"eventName": "**Retry adding to front of queue**. No slot available.","attributes": {"logger": "org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable","request": "SessionRequest[requestId=035bc19e-125c-4555-bb32-3202d2aa6a56, desiredCapabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [incognito, --disable-browser-side-navi..., --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage], prefs: {intl: {accept_languages: en-US}}, w3c: true}}], downstreamDialects=[W3C], metadata={desiredCapabilities={browserName=chrome, acceptInsecureCerts=true, goog:chromeOptions={prefs={intl={accept_languages=en-US}}, w3c=true, args=[incognito, --disable-browser-side-navigation, --ignore-certificate-errors, --no-sandbox, --start-fullscreen, --disable-infobars, --disable-dev-shm-usage]}}}, traceHeaders={traceparent=00-0052ee5b995e046800c34be87f386338-6ea589eaa1851fd2-01}]","request.id": "035bc19e-125c-4555-bb32-3202d2aa6a56","request.retry_add": true}}
18:31:57.115 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=a42873d2fbbd5b30, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=6ea589eaa1851fd2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.poll_queue, kind=INTERNAL, startEpochNanos=1711477917035000000, endEpochNanos=1711477917114692381, attributes=AttributesMap{data={request.id=035bc19e-125c-4555-bb32-3202d2aa6a56}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
18:32:01.320 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=36d1570e9bd5af94, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=0052ee5b995e046800c34be87f386338, spanId=6ea589eaa1851fd2, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null, attributes={}}, name=distributor.poll_queue, kind=INTERNAL, startEpochNanos=1711477920290000000, endEpochNanos=1711477921320458446, attributes=AttributesMap{data={request.id=035bc19e-125c-4555-bb32-3202d2aa6a56}, capacity=128, totalAddedValues=1}, totalAttributeCount=1, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
I don't currently have time to add much detail, but I did want to chime in that I'm seeing the same got read error: java.io.IOException: connection closed locally
issues, where the Hub pod is online and seems to be staying that way, but it is struggling to communicate with the nodes as they register.
Edit: Resolved my inter-node errors by ensuring the chart dependencies were rebuilt and redeploying the SE 4.20 images to a clean environment. My pods are still dying off quickly after start, but this appears to be related to the startup probe; turning it off kept my containers running as expected.
With the issue, after some time, new requests are not accepted by hub/distributor anymore. If you are deploying Grid in K8s, you can refer to liveness probe settings, it will guard to restart once meet 0 sessions while queue getting up - https://github.com/SeleniumHQ/docker-selenium/tree/trunk/charts/selenium-grid#distributor-probes For docker compose, you can add a health check with the script follow as - https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/configs/distributor/distributorProbe.sh
Also, these env vars name looks like not support, mostly it starts with prefix SE_
-e GRID_BROWSER_TIMEOUT=300 \
-e GRID_TIMEOUT=300 \
-e GRID_MAX_SESSION=60 \
-e GRID_MAX_INSTANCES=3 \
-e GRID_CLEAN_UP_CYCLE=5000 \
-e GRID_UNREGISTER_IF_STILL_DOWN_AFTER=180000 \
-e GRID_NEW_SESSION_WAIT_TIMEOUT=60 \
What happened?
Suddenly when we run some automation jobs the grid hub queue size is growing and not allocating sessions though we have enough sessions. We had to restart the grid to overcome this issue. This happens very frequently. We are using podman for configuring our grid. This seems relevant to this issue https://github.com/SeleniumHQ/docker-selenium/issues/1865. Tests will run for some time, and suddenly this issue is popping up.
Capabilities:
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
Red Hat Enterprise Linux release 8.9 (Ootpa)
Docker Selenium version (image tag)
4.14.1-20231025
Selenium Grid chart version (chart version)
NA