SergeyPirogov / webdriver_manager

Apache License 2.0
1.98k stars 440 forks source link

Can not connect to the Service #633

Open cyberdyne-j opened 9 months ago

cyberdyne-j commented 9 months ago

I'm running webdriver manager with Chrome and Selenium 4.10.0 in a python scraper that launches 2 consecutive tasks, the first runs fine: ====== WebDriver manager ====== [2023-10-03 16:49:35,155: INFO/MainProcess] Get LATEST chromedriver version for google-chrome Driver [C:{user home}.wdm\drivers\chromedriver\win64\117.0.5938.92\chromedriver-win32/chromedriver.exe] found in cache

works like a charm, however, when the first process launches the second task, after closing the browser, it fails consistantly; ====== WebDriver manager ====== [2023-10-03 17:09:13,808: INFO/MainProcess] Get LATEST chromedriver version for google-chrome [2023-10-03 17:09:14,333: INFO/MainProcess] Get LATEST chromedriver version for google-chrome [2023-10-03 17:09:14,789: INFO/MainProcess] Driver [C:{user_home}.wdm\drivers\chromedriver\win64\117.0.5938.92\chromedriver-win32/chromedriver.exe] found in cache [2023-10-03 17:09:44,937: INFO/MainProcess] Task scrape_portal_task[87b39738-69e9-4f26-86d5-b27fdf0d3696] succeeded in 1215.311999999918s: {'error_code': 1, 'message': 'Message: Can not connect to the Service C:{user_home}.wdm\drivers\chromedriver\win64\117.0.5938.92\chromedriver-win32/chromedriver.exe

not sure what's going on but if the first task takes long to complete the second fails. If it runs for a short duration, it works.

Can you shed some light on what may be going on?

here are my imports:

from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService from webdriver_manager.chrome import ChromeDriverManager

and my init:

chrome_prefs = { 'profile.default_content_settings.popups': 0, 'download.default_directory': self.download_dir }

self.chrome_options = webdriver.ChromeOptions() self.chrome_options.add_experimental_option("prefs", chrome_prefs) self.chrome_options.add_argument(f"--user-data-dir={self.basedir}Chrome{str(self.office_id)}\")

and the methods: first funtion:

BROWSER_WAIT = 180 chrome_service = ChromeService(ChromeDriverManager().install()) browser = webdriver.Chrome(service=chrome_service, options=self.chrome_options) browser.implicitly_wait(BROWSER_WAIT) wait = WebDriverWait(browser, BROWSER_WAIT)

second function: BROWSER_WAIT = 180 chrome_service = ChromeService(ChromeDriverManager().install()) try: browser = webdriver.Chrome(service=chrome_service, options=self.chrome_options) except Exception as e: print(str(e)) browser = webdriver.Chrome(service=chrome_service, options=self.chrome_options)

browser.implicitly_wait(BROWSER_WAIT) wait = WebDriverWait(browser, BROWSER_WAIT)

NOTE: The reason i'm reinstatiationg it was a failed attempt at getting it to work

cyberdyne-j commented 9 months ago

pretty sad that the devs on this project don't even look at the issues in a timely fashion.

david-engelmann commented 9 months ago

It seems like the first functions browser object is still open ie. after closing the browser. I'm not sure where this occurs

cyberdyne-j commented 9 months ago

i use browser.quit() in the exception handler before relaunching. The browser is definitely closed and exited properly. I've since merged the 2 functions into 1 and it still craps out if a retry is triggered. Perhaps it takes time for cleanup to occur, not sure what's going on but don't have time to debug further. Either way, i can't seem to get more than 1 instance running with this implementation.

I've since reverted to using the basic webdriver implementation. I know it requires a manual change to the chromedriver when it's outdated but at least i know it works.

I may just fork this repo and rework it so that it actually saves me time and frustration, if the dev doesn't address this... but that'll have to wait.

david-engelmann commented 9 months ago

if things need to be run parallel then it relates to this issue - https://github.com/SergeyPirogov/webdriver_manager/issues/631, which I left a comment in. If they dont need to be parallel then I would suggest reusing the same browser/driver object after its created. You should be able to run as you described tho, sometimes I'll add a small sleep(.001) or something to make sure, it ready to move to the next section

cyberdyne-j commented 9 months ago

I hear you and read the issue in question... it's just more effort and causes me to have to rewrite my code to accommodate the shortcomings of this lib. I'll keep an eye on it, but for now, I've switched back to the old, working, and trusted way... no need to download multiple drivers, etc. 1 chromedriver in the path and can run multiple instances without issue. It's just not enough to warrant going down that rabbit hole, but thanks for the input!

david-engelmann commented 9 months ago

You are saying that a past version of the web driver / selenium let you access the same chrome binaries from different threads? If you have a way to reproduce, I'd def check it out!

cyberdyne-j commented 9 months ago

Using the old method (no webdriver service), you can run as many instances of the browser as you want. As long as you have a separate Chrome user directory for each instance. e.g. Chrome_1, Chrome_2. The webdriver manager service fails to launch if one is already running and doesn't restart if you .quit() and try again. You get "Cannot Connect to Service)

this is a simple method and works. chrome_options.addargument(f"--user-data-dir={some dir)\Chrome(instance_num)

david-engelmann commented 9 months ago

I created a repo to try and reproduce the issue. You can find it here - https://github.com/david-engelmann/selenium_sadness. I'm currently seeing expected behavior. Please feel free to adjust it to reproduce the issue you are experiencing.

cyberdyne-j commented 9 months ago

Just checked it out, can you elaborate? I'm not seeing where you are using the Service... anyhow i forked the selenium_sadness to use the webdriver_manager. https://github.com/cyberdyne-j/selenium_sadness Your version was using the non wdm version.

let me know how it goes.

p.s. remember that i noted that when the instance runs quickly, it seems to be ok, but if you have a long running scrape, the restart fails to connect to the service.

david-engelmann commented 9 months ago

@cyberdyne-j It seemed like you were setting the path to the chrome binaries here "download.default_directory": f"{self.base_dir}/chromedriver",... In general, if you already know the path beforehand you should pass it directly. I went ahead and merged those changes and I'm seeing the same results.

cyberdyne-j commented 9 months ago

@david-engelmann not at all, my path contains C:\inetpub\exes in which the chromedriver resides. The download.default_directory is where the scraper sends the downloads it fetches from click events that download excel files.

the wdm uses the path its download manager uses to store the chromedriver it fetches, in my case it's .wdm... as i use the flag os.environ['WDM_LOCAL'] = '1' to set it to my project root

cyberdyne-j commented 9 months ago

@david-engelmann when you say you're seeing the same result, do you mean you get the "Cannot Connect to Service" or are you saying it works?

david-engelmann commented 9 months ago

I'm seeing it make the requests and return results but it appears they are sequential. Still looking to get the Cannot Connect to Service. potentially, I need to split it into a pytest-xdist flow to reproduce the error

cyberdyne-j commented 9 months ago

small requests don't seem to be a problem, but i have long running tasks and it always fails. If you toss some errors into the mix and try to restart, see what you get.

david-engelmann commented 9 months ago

@cyberdyne-j I added in a github workflow to run the tests asynchronously, now we can add a huge amount of long request and see if we can reproduce the error

cyberdyne-j commented 9 months ago

@cyberdyne-j I added in a github workflow to run the tests asynchronously, now we can add a huge amount of long request and see if we can reproduce the error

looking forward to your results.

david-engelmann commented 9 months ago

I've added in the testing pipeline. So far it passes with 8 test with 2 workers

cyberdyne-j commented 9 months ago

I've added in the testing pipeline. So far it passes with 8 test with 2 workers

perhaps it has something to do with the nature of the tasks. I'm not running this with asyncio, the app is using CherryPy as the backend and running requests with an average of 100 + items per request, each averaging about a minute to process. I don't require it to be async as it's being piped through RabbitMQ in separate processes, but that doesn't convince me that there is not something inherently wrong with the repo for use in my case.

I just ran a huge task with the old method and had no issues at all... so, to be determined.

p.s. i'm using selenium 4.13.0

david-engelmann commented 9 months ago

Based on the situation you are describing, it sounds like the chrome binaries don't exist within the RabbitMQ queue and are being downloaded each time. It's very odd that the behavior would differ between the versions. What are the two you are toggling between?

cyberdyne-j commented 9 months ago

I mentioned in a previous post that i've since merged the methods so there is no toggling, the chrmedriver is in the path so it's no the problem with any method finding it. it just throws an error saying it can't connect to the service after it successfully downloads. The RabbitMQ is on the same machine and has no problem launching the old method, neither does it have a problem launching using wmd, if it encounters an error, or if another job tries to run, i see WMD download the driver, and attempt to run but inevitably fails with Cannot Connect to Service. even though it displays the entire path to the driver.

As far as I can tell, it just can't connect if a restart happens or another process tries to run, even if it's an unrelated task.

I don't see how i can work around this, i've been working with these tools for years and never ran into anything like this.

thanks for trying though... not confident we'll arrive at a fix with this version. Like i said, i may fork the wdm and see what i can do with it because as i run through the code, i'm seeing stuff i'd like to change... anyhow, that'll have to wait till i have time to do that. For now, i;m perfectly ok with reverting back to using chromedriver the old fashioned way. :/

david-engelmann commented 9 months ago

@cyberdyne-j thats definitely odd behavior, I'm going to keep that repo for reproducing issues open. We can add tests to it to try and find a way to reproduce it.

cyberdyne-j commented 9 months ago

@cyberdyne-j thats definitely odd behavior, I'm going to keep that repo for reproducing issues open. We can add tests to it to try and find a way to reproduce it.

sounds like a plan. I'll keep trying stuff on my end and post back here if i uncover anything.

thanks!