Closed Abhijith2893 closed 2 years ago
We've spent a lot of time digging into this issue and fixes have been landed. But I guess the tool updates and tries more things than can bring this up. To be honest, I do not understand the need to scan the Grid the whole time, as it just adds more load to it.
This is just a way of saying that we are happy to look at this if the exact request to reproduce the issue is shared. Otherwise we won't be spending time on this. With the prices of security tools, I assume you can spend some time and resources to research and find the causing HTTP request and help us fix this.
Thank you for the quick response. Nexpose is a company wide vulnerability scan tool used by the security team for daily scans for all hosts. This disconnect is the currently preventing us from migrating to the selenium 4 grid from 3. I understand its difficult to fix without proper repro steps and will try to narrow down the http call.
I think I have found such a http request! 🥳 Here is a postman collection with the corresponding request to recreate: Grid-breaking-Request.postman_collection.json.zip
I had to zip the file because GitHub wants me to. Just unpack and import the JSON file in Postman.
I took a closer look at the problematic request. Nexpose seems to use it to check whether the server is attackable for this vulnerability. How the attack works is well explained here.
I have now simplified the request as much as possible and updated the postman collection. Grid-breaking-Request.postman_collection.json.zip
@diemol Does this meet your expectation or can I contribute more information so you guys can fix this?
Thank you for the details. After going through this issue and related issues and trying something basic locally, I am not fully sure how to leverage the information shared to reproduce this. @Trigtrig Have you used the request shared without the scanning software to reproduce this locally? Will really appreciate something we can use and reproduce easily. @diemol Any pointers based on your experience from the previous fixes about reproducing such issues? At the core, it seems like a Selenium Grid-related bug. I am happy to help on this front.
@pujagani Yes, i can reproduce this without the scanning software. These are the steps I perform locally:
version: "3.7"
services: selenium-hub: image: selenium/hub:4 container_name: selenium-hub ports:
SE_SESSION_RETRY_INTERVAL=5
node-docker: image: selenium/node-docker:4 container_name: selenium-node volumes:
config.toml:
[docker]
configs = [ "selenium/standalone-chrome:latest", "{\"browserName\": \"chrome\"}", "selenium/standalone-edge:latest", "{\"browserName\": \"MicrosoftEdge\"}", "selenium/standalone-firefox:latest", "{\"browserName\": \"firefox\"}" ]
url = "http://host.docker.internal:2375"
2. calling the UI in the browser should look something like this:
<img width="641" alt="grid_before" src="https://user-images.githubusercontent.com/7973740/198296970-b72d28ad-8584-46a7-a91b-b9a5564ec26a.png">
3. trigger problematic request with Postman:
<img width="751" alt="postman" src="https://user-images.githubusercontent.com/7973740/198297696-18002785-8de1-4fe8-bc63-017cd8771cc6.png">
4. calling the UI again looks like this (you have to refresh the page):
<img width="686" alt="grid_after" src="https://user-images.githubusercontent.com/7973740/198298087-e1c5aa28-fe9e-4abf-8ca7-bf538a213b2e.png">
The ``/graphql`` endpoint is now stuck. Any calls to the endpoint with Postman/curl/... now fail.
@Trigtrig Appreciate the details and the quick response. Pardon my lack of knowledge in this area, what server should the problematic HTTP request shared earlier using postman export point to? That was the main point I was not clear about. Is there something I need to set up?
@pujagani Just replace {{url}}
with localhost
and {{port}}
with 4444
. I use Postman Variables to switch between different Selenium Grids, but it is not needed here.
I did do that with local Grid but was not using Docker and I only saw the same message shared in the screenshot "Unable to find handler" but that did not effect the Grid or the Grid UI. I will try with docker tomorrow and update my findings here. Thank you!
Okay, that's interesting. I just tried it with the current selenium-server-4.5.3.jar as described here and I see exactly the same problem as with Docker.
I tried with both selenium-server-4.5.3.jar and docker (using the same files as shared by you), I ran around 200 sessions while constantly sending the problematic HTTP request via Postman and the Grid UI was working perfectly in both cases. I am unable to reproduce the issue on my end to identify the problem. My machine has MacOS Big Sur - Version 11.4. I am not sure if the issue is OS-specific (I do not have a windows machine to check it out) and if having the security software installed on the machine impacts the Grid in any way.
I was able to reproduce the issue with the postman collection provided, thank you for that.
Since the resource was not found, the Grid replies with a 404 but this causes that the whole content from the request is not read, which leaves an input pipe waiting for content and eventually causing a lock. That is what causes the Grid to hang and no process new requests (like serving GraphQL requests from the Grid UI).
The fix closes the input pipeline when a 404 is responded.
This will be part of 4.6.0, which should come out in the next few days.
Although, it might be that any other request from a vulnerability scan triggers this again. If that is the case, please open a new issue.
I'm glad you were able to reproduce it after all. I am really looking forward to the next Selenium version. Thanks for the fix.
@diemol similar issue is observed in 3.141.59 version, is it possible to add a patch?
@krishtoautomate we do not release 3.x anymore.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
What happened?
Observing reoccurrence of #1434 and #1497 in distributed selenium grid 4.3.0-20220706
Observing disconnects in selenium grid 4 in distributed form after security vulnerability scanning software (nexpose) performs a scan on the host. This causes the
/graphql
endpoint to fail and no nodes to be displayed on the UI and the grid does not accept any new tests. A restart of the selenium-router docker container brings the grid back to normal.Command used to start Selenium Grid with Docker
Relevant log output
Operating System
Linux
Docker Selenium version (tag)
4.3.0-20220706