cryzed / Selenium-Requests

Extends Selenium WebDriver classes to include the request function from the Requests library, while doing all the needed cookie and request headers handling.
MIT License
494 stars 60 forks source link

Deadlock when using remote browser #62

Open kennedyjosh opened 8 months ago

kennedyjosh commented 8 months ago

OS: Ubuntu 22.04 (jammy) Python: 3.10 Selenium-Requests: 2.0.3

I have the following docker container running a mocked Chrome browser: selenium/standalone-chrome Github link: https://github.com/SeleniumHQ/docker-selenium

Example code to reproduce:

from seleniumrequests import Remote
from selenium.webdriver.chrome.options import Options as ChromeOptions

driver = Remote('http://localhost:4444/wd/hub', options=ChromeOptions())
response = driver.request("GET", "https://substack.com/sign-in")

This code will hang on the last line. Here is the output when forcing the program to end using Ctrl + C:

^CTraceback (most recent call last):
  File "/home/ubuntu/news.py", line 140, in <module>
    main()
  File "/home/ubuntu/news.py", line 125, in main
    stage_module = importlib.import_module("stages." + stage + "." + argv[stage])
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/ubuntu/stages/delivery/substack2.py", line 5, in <module>
    response = driver.request("GET", "https://substack.com/sign-in")
  File "/home/ubuntu/venv/lib/python3.10/site-packages/seleniumrequests/request.py", line 160, in request
    self.requests_session.headers = get_webdriver_request_headers(self, proxy_host=self.__proxy_host)
  File "/home/ubuntu/venv/lib/python3.10/site-packages/seleniumrequests/request.py", line 68, in get_webdriver_request_headers
    UPDATER_HEADERS_MUTEX.acquire()
  File "/usr/lib/python3.10/threading.py", line 467, in acquire
    self._cond.wait(timeout)
  File "/usr/lib/python3.10/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt

Looks like some sort of deadlock when trying to acquire that mutex to update headers.

kennedyjosh commented 8 months ago

Looks like setting headers depends on the HTTPRequestHandler class being instantiated at some point prior to the code block where the mutex is being acquired, but this isn't happening in this case. Will continue to investigate

srikalidindi commented 7 months ago

Hello,

Thanks for your work here, we are also facing the same issue and looking forward for this solution...

srikalidindi commented 7 months ago

@kennedyjosh did you check the lower part of the readme? It is important to specify the proxy_host='???' which is the IP of the machine on which you are actually running the code. On this machine a website is served by the library. If you do not specify this option selenium will search it on localhost where there is surely nothing exposed. (at least this is my interpretation)