SeleniumHQ / docker-selenium

Provides a simple way to run Selenium Grid with Chrome, Firefox, and Edge using Docker, making it easier to perform browser automation
http://www.selenium.dev/docker-selenium/
Other
8.01k stars 2.51k forks source link

[🐛 Bug]: Dockerized Selenium grid 4 stops responding when scanned by vulnerability scanning software (Nexpose) #1646

Closed Abhijith2893 closed 2 years ago

Abhijith2893 commented 2 years ago

What happened?

Observing reoccurrence of #1434 and #1497 in distributed selenium grid 4.3.0-20220706

Observing disconnects in selenium grid 4 in distributed form after security vulnerability scanning software (nexpose) performs a scan on the host. This causes the /graphql endpoint to fail and no nodes to be displayed on the UI and the grid does not accept any new tests. A restart of the selenium-router docker container brings the grid back to normal.

Command used to start Selenium Grid with Docker

Looks like nexpose scans all the open ports on the host but unsure which exact request is causing the issue. Unable to repro with any specific request but able to repo consistently by kicking off a nexpose scan on the host.

 No error logs are generated in any of the selenium grid 4 componenets when the grid goes down, but the `/graphql` endpoint times out.

Relevant log output

Possible problematic request from the scan
2022-08-01T20:40:28 [INFO] [Thread: 10.240.9.14:4444/TCP] [Site: ENG] Wrote 508 byte(s).
   16 03 03 01 F7 01 00 01  F3 03 03 E1 57 F7 EE DB    ............W...
   DC 69 AE 6D 0F C6 38 CF  97 90 C9 89 1E B7 DE 2F    .i.m..8......../
   64 AE 7D 64 E6 24 D5 68  E1 1B 4D 20 45 BA 04 BF    d.}d.$.h..M E...
   B9 CD 07 64 52 70 10 D2  3C 81 07 2A 91 94 C0 D9    ...dRp..<..*....
   02 84 14 B3 D9 B1 64 BC  FC F6 AB B9 00 B6 13 02    ......d.........
   13 01 13 03 C0 2C C0 2B  CC A9 C0 30 CC A8 C0 2F    .....,.+...0.../
   00 9F CC AA 00 A3 00 9E  00 A2 C0 24 C0 28 C0 23    ...........$.(.#
   C0 27 00 6B 00 6A 00 67  00 40 C0 2E C0 32 C0 2D    .'.k.j.g.@...2.-
   C0 31 C0 26 C0 2A C0 25  C0 29 C0 0A C0 14 C0 09    .1.&.*.%.)......
   C0 13 00 39 00 38 00 33  00 32 C0 05 C0 0F C0 04    ...9.8.3.2......
   C0 0E 00 9D 00 9C 00 3D  00 3C 00 35 00 2F C0 08    .......=.<.5./..
   C0 12 00 13 C0 03 C0 0D  00 0A 00 FF 00 A7 00 A6    ................
   00 6D C0 19 00 3A 00 6C  C0 18 00 34 C0 17 00 1B    .m...:.l...4....
   C0 07 C0 11 00 05 C0 02  C0 0C 00 04 C0 16 00 18    ................
   00 09 00 15 00 12 00 1A  00 08 00 14 00 11 00 19    ................
   00 03 00 17 00 3B C0 06  C0 10 00 02 C0 01 C0 0B    .....;..........
   C0 15 00 01 01 00 00 F4  00 05 00 05 01 00 00 00    ................
   00 00 0A 00 12 00 10 00  17 00 18 00 19 01 00 01    ................
   01 01 02 01 03 01 04 00  0B 00 02 01 00 00 0D 00    ................
   2A 00 28 04 03 05 03 06  03 08 04 08 05 08 06 08    *.(.............
   09 08 0A 08 0B 04 01 05  01 06 01 04 02 03 03 03    ................
   01 03 02 02 03 02 01 02  02 01 01 00 32 00 2A 00    ............2.*.
   28 04 03 05 03 06 03 08  04 08 05 08 06 08 09 08    (...............
   0A 08 0B 04 01 05 01 06  01 04 02 03 03 03 01 03    ................
   02 02 03 02 01 02 02 01  01 00 11 00 09 00 07 02    ................
   00 04 00 00 00 00 00 17  00 00 00 2B 00 0D 0C 03    ...........+....
   04 03 03 03 02 03 01 03  00 00 02 00 2D 00 02 01    ............-...
   01 00 33 00 47 00 45 00  17 00 41 04 14 DD 41 B6    ..3.G.E...A...A.
   5B B8 58 7E 93 82 18 D1  FD 59 CD 0E 83 39 5B B7    [.X~.....Y...9[.
   E3 BD 35 0B 07 D1 26 3E  2B 69 8E 34 EE A9 5E 23    ..5...&>+i.4..^#
   4E C9 96 48 42 4F 8D 0B  0C EE 0E D8 25 E4 EF 18    N..HBO......%...
   BB 8F B5 D7 F3 CA 7F DA  8D 56 9B 8C                .........V..
2022-08-01T20:40:28 [INFO] [Thread: 10.240.9.14:4444/TCP] [Site: ENG] Read 82 byte(s).
   48 54 54 50 2F 31 2E 31  20 34 30 30 20 42 61 64    HTTP/1.1 400 Bad
   20 52 65 71 75 65 73 74  0D 0A 63 6F 6E 74 65 6E     Request..conten
   74 2D 6C 65 6E 67 74 68  3A 20 31 35 0D 0A 63 6F    t-length: 15..co
   6E 6E 65 63 74 69 6F 6E  3A 20 63 6C 6F 73 65 0D    nnection: close.
   0A 0D 0A 34 30 30 20 42  61 64 20 52 65 71 75 65    ...400 Bad Reque
   73 74                                               st

Operating System

Linux

Docker Selenium version (tag)

4.3.0-20220706

diemol commented 2 years ago

We've spent a lot of time digging into this issue and fixes have been landed. But I guess the tool updates and tries more things than can bring this up. To be honest, I do not understand the need to scan the Grid the whole time, as it just adds more load to it.

This is just a way of saying that we are happy to look at this if the exact request to reproduce the issue is shared. Otherwise we won't be spending time on this. With the prices of security tools, I assume you can spend some time and resources to research and find the causing HTTP request and help us fix this.

Abhijith2893 commented 2 years ago

Thank you for the quick response. Nexpose is a company wide vulnerability scan tool used by the security team for daily scans for all hosts. This disconnect is the currently preventing us from migrating to the selenium 4 grid from 3. I understand its difficult to fix without proper repro steps and will try to narrow down the http call.

Trigtrig commented 2 years ago

I think I have found such a http request! 🥳 Here is a postman collection with the corresponding request to recreate: Grid-breaking-Request.postman_collection.json.zip

I had to zip the file because GitHub wants me to. Just unpack and import the JSON file in Postman.

Trigtrig commented 2 years ago

I took a closer look at the problematic request. Nexpose seems to use it to check whether the server is attackable for this vulnerability. How the attack works is well explained here.

I have now simplified the request as much as possible and updated the postman collection. Grid-breaking-Request.postman_collection.json.zip

@diemol Does this meet your expectation or can I contribute more information so you guys can fix this?

pujagani commented 2 years ago

Thank you for the details. After going through this issue and related issues and trying something basic locally, I am not fully sure how to leverage the information shared to reproduce this. @Trigtrig Have you used the request shared without the scanning software to reproduce this locally? Will really appreciate something we can use and reproduce easily. @diemol Any pointers based on your experience from the previous fixes about reproducing such issues? At the core, it seems like a Selenium Grid-related bug. I am happy to help on this front.

Trigtrig commented 2 years ago

@pujagani Yes, i can reproduce this without the scanning software. These are the steps I perform locally:

  1. start the Selenium Grid. docker-compose.yaml:
    
    version: "3.7"

services: selenium-hub: image: selenium/hub:4 container_name: selenium-hub ports:

URL for connecting to the docker daemon

Windows: make sure Docker Desktop exposes the daemon via tcp, and use http://host.docker.internal:2375.

url = "http://host.docker.internal:2375"


2. calling the UI in the browser should look something like this:
<img width="641" alt="grid_before" src="https://user-images.githubusercontent.com/7973740/198296970-b72d28ad-8584-46a7-a91b-b9a5564ec26a.png">
3. trigger problematic request with Postman:
<img width="751" alt="postman" src="https://user-images.githubusercontent.com/7973740/198297696-18002785-8de1-4fe8-bc63-017cd8771cc6.png">
4. calling the UI again looks like this (you have to refresh the page):
<img width="686" alt="grid_after" src="https://user-images.githubusercontent.com/7973740/198298087-e1c5aa28-fe9e-4abf-8ca7-bf538a213b2e.png">

The ``/graphql`` endpoint is now stuck. Any calls to the endpoint with Postman/curl/... now fail.
pujagani commented 2 years ago

@Trigtrig Appreciate the details and the quick response. Pardon my lack of knowledge in this area, what server should the problematic HTTP request shared earlier using postman export point to? That was the main point I was not clear about. Is there something I need to set up?

Trigtrig commented 2 years ago

@pujagani Just replace {{url}} with localhost and {{port}} with 4444. I use Postman Variables to switch between different Selenium Grids, but it is not needed here.

pujagani commented 2 years ago

I did do that with local Grid but was not using Docker and I only saw the same message shared in the screenshot "Unable to find handler" but that did not effect the Grid or the Grid UI. I will try with docker tomorrow and update my findings here. Thank you!

Trigtrig commented 2 years ago

Okay, that's interesting. I just tried it with the current selenium-server-4.5.3.jar as described here and I see exactly the same problem as with Docker.

pujagani commented 2 years ago

I tried with both selenium-server-4.5.3.jar and docker (using the same files as shared by you), I ran around 200 sessions while constantly sending the problematic HTTP request via Postman and the Grid UI was working perfectly in both cases. I am unable to reproduce the issue on my end to identify the problem. My machine has MacOS Big Sur - Version 11.4. I am not sure if the issue is OS-specific (I do not have a windows machine to check it out) and if having the security software installed on the machine impacts the Grid in any way.

diemol commented 2 years ago

I was able to reproduce the issue with the postman collection provided, thank you for that.

Since the resource was not found, the Grid replies with a 404 but this causes that the whole content from the request is not read, which leaves an input pipe waiting for content and eventually causing a lock. That is what causes the Grid to hang and no process new requests (like serving GraphQL requests from the Grid UI).

The fix closes the input pipeline when a 404 is responded.

diemol commented 2 years ago

This will be part of 4.6.0, which should come out in the next few days.

diemol commented 2 years ago

Although, it might be that any other request from a vulnerability scan triggers this again. If that is the case, please open a new issue.

Trigtrig commented 2 years ago

I'm glad you were able to reproduce it after all. I am really looking forward to the next Selenium version. Thanks for the fix.

krishtoautomate commented 1 year ago

@diemol similar issue is observed in 3.141.59 version, is it possible to add a patch?

diemol commented 1 year ago

@krishtoautomate we do not release 3.x anymore.

github-actions[bot] commented 12 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.