[🐛 Bug]: Deleting the node via CuRL command respawns a new node - Kubernetes

kherath17 commented 5 months ago

What happened?

Context:

I'm trying to delete the node attach to the grid via below curl command cURL --request DELETE 'http://<Grid Endpoint>/se/grid/distributor/node/<Node-ID>' --header 'X-REGISTRATION-SECRET;'

above command gets executed successfully and the specific node disappears from the Grid UI as well from the below response for few seconds but again, gets created after a while. cURL GET 'https://<Grid Endpoint>/sandbox_qlabv2/status'

When checked on the pod log show as below every node deleting triggered

Alternatively

Is there any way to identify to which Selenium Grid node id is the current browser pod created is mapped to?

Command used to start Selenium Grid with Docker (or Kubernetes)

Configured on Kubernetes

Relevant log output

Starting Selenium Grid Node...
2024-05-06 05:10:34,263 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-06 05:10:34,263 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-06 05:10:34,264 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
05:10:35.857 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
05:10:35.952 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
05:10:36.648 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-hub:4442 and tcp://selenium-hub:4443
05:10:36.961 INFO [UnboundZmqEventBus.<init>] - Sockets created
05:10:38.043 INFO [UnboundZmqEventBus.<init>] - Event bus ready
05:10:38.569 INFO [NodeServer.createHandlers] - Reporting self as: http://100.68.38.40:5555
05:10:38.681 INFO [NodeOptions.getSessionFactories] - Detected 1 available processors
05:10:38.983 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "124.0","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
05:10:39.149 INFO [Node.<init>] - Binding additional locator mechanisms: relative
05:10:40.051 INFO [NodeServer$1.start] - Starting registration process for Node http://100.68.38.40:5555
05:10:40.056 INFO [NodeServer.execute] - Started Selenium node 4.20.0 (revision 866c76ca80): http://100.68.38.40:5555
05:10:40.157 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
05:10:40.867 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:13:04.360 INFO [LocalNode.newSession] - Session created by the Node. Id: 0eac8cd13d41a67fc21c75afa7c120ca, Caps: Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 124.0.6367.118, chrome: {chromedriverVersion: 124.0.6367.91 (51df0e5e17a8..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:44019}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://perfplatform.cloud.sy..., se:cdpVersion: 124.0.6367.118, se:name: testQLabV2_2024/05/06 10:42..., se:vnc: wss://perfplatform.cloud.sy..., se:vncEnabled: true, se:vncLocalAddress: ws://100.68.38.40:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
05:13:17.070 INFO [SessionSlot.stop] - Stopping session 0eac8cd13d41a67fc21c75afa7c120ca
05:40:39.370 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:41:09.378 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:44:09.365 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:45:39.367 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:47:39.365 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:48:39.364 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
05:58:39.364 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
06:25:39.367 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
06:26:09.364 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
06:27:39.363 INFO [NodeServer.lambda$createHandlers$2] - Node has been added

Operating System

Kubernetes - EKS

Docker Selenium version (image tag)

4.17.0

Selenium Grid chart version (chart version)

NA

github-actions[bot] commented 5 months ago

@kherath17, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

VietND96 commented 5 months ago

Hi @kherath17, in this case, can you add env var SE_LOG_LEVEL=FINE to both Hub and Node. Then execute the cURL command to drain node and capture logs to see which events that trigger it respawns

kherath17 commented 5 months ago

@VietND96 Log files attached (fyi its the node deletion Im triggering not node drain)

Chrome_Pod_log.txt

Hub_Pod_log.txt

btw do you think triggering the node deletion through a curl command would not cause the k8 pod to be deleted? since the pod is on Kubernetes level and node is something that we engage with Selenium Grid level.

Expectation : What I am trying to achieve here is to build a custom script that hits the /status endpoint and check if session is null or not for each element in node array, if its null will be concluding that this node is not currently active, hence will trigger node deletion straightaway without any draining commands to fasten the scaling down process

VietND96 commented 5 months ago

btw do you think triggering the node deletion through a curl command would not cause the k8 pod to be deleted? since the pod is on Kubernetes level and node is something that we engage with Selenium Grid level.

If the Node is a deployment type, I guess it could not be deleted by the Selenium API endpoint. Because K8s deployment with restartPolicy: always, it guards the number of replicas. Whenever the process in the container stops, pod 0/1, K8s will restart the container and wait it up again. Whenever a node is up, it will again send the event to Hub for registration. Even you tried to update restartPolicy: never, K8s will raise the error that deployment not support that restart policy. I think if you want to send a delete signal to Hub and Node will be terminated, the Node should be deployed as Job

kherath17 commented 5 months ago

@VietND96 actually the nodes are of POD type no replica counts specified or maintained, deleting the pod via kubectl command is working as expected, the only issue is deleting the node via specified curl command which deletes the node but then creates a new node within the same browser pod which results in below log lines

05:58:39.364 INFO [NodeServer.lambda$createHandlers$2] - Node has been added 06:25:39.367 INFO [NodeServer.lambda$createHandlers$2] - Node has been added 06:26:09.364 INFO [NodeServer.lambda$createHandlers$2] - Node has been added 06:27:39.363 INFO [NodeServer.lambda$createHandlers$2] - Node has been added

SeleniumHQ / docker-selenium