Open Dunkhan opened 1 year ago
Hi @Dunkhan ,
The second issue (the chrome page crash) sometimes shows up when the docker container doesn't have enough memory available. How much memory are you assigning to your docker containers?
The first issue is less clear - there's not enough information in the error message you pasted to point to any flathunter
code. But it looks like somewhere where the code expects a filename it has received a None object. Do you have the Chrome binary available in the environment where you are running docker as a service?
Thanks for the response. I am not sure how to increase the memory to the docker container (I am not terribly familiar with docker). My understanding was that the memory isn't limited by default and I have not taken any steps to limit it.
I checked to make sure chrome was installed on the server for the service and it seems it was not installed correctly, now I think it is though, and the error output has changed:
patching driver executable /home/flathunter/.local/share/undetected_chromedriver/undetected_chromedriver
Traceback (most recent call last):
File "flathunt.py", line 118, in <module>
main()
File "flathunt.py", line 114, in main
launch_flat_hunt(config, heartbeat)
File "flathunt.py", line 36, in launch_flat_hunt
hunter.hunt_flats()
File "/opt/flathunter/flathunter/hunter.py", line 56, in hunt_flats
for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
File "/opt/flathunter/flathunter/hunter.py", line 35, in crawl_for_exposes
return chain(*[try_crawl(searcher, url, max_pages)
File "/opt/flathunter/flathunter/hunter.py", line 35, in <listcomp>
return chain(*[try_crawl(searcher, url, max_pages)
File "/opt/flathunter/flathunter/hunter.py", line 27, in try_crawl
return searcher.crawl(url, max_pages)
File "/opt/flathunter/flathunter/abstract_crawler.py", line 150, in crawl
return self.get_results(url, max_pages)
File "/opt/flathunter/flathunter/crawler/immobilienscout.py", line 90, in get_results
soup = self.get_page(search_url, self.get_driver(), page_no)
File "/opt/flathunter/flathunter/crawler/immobilienscout.py", line 65, in get_driver
self.driver = get_chrome_driver(driver_arguments)
File "/opt/flathunter/flathunter/chrome_wrapper.py", line 47, in get_chrome_driver
driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 441, in __init__
super(Chrome, self).__init__(
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__
super().__init__(
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__
super().__init__(
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
self.start_session(capabilities, browser_profile)
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 704, in start_session
super(selenium.webdriver.chrome.webdriver.WebDriver, self).start_session(
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
self.error_handler.check_response(response)
File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:45545
I read in another report a suggestion to check --version on google-chrome, chrome and chromium. In case this is relevant it only returns a version (112.0.5615.121) for google-chrome and nothing for the other two.
Hi @Dunkhan ,
Looking here, there does seem to be a system memory limit for Docker Mac: https://docs.docker.com/desktop/settings/mac/
It might be worth checking if increasing the memory allocation there helps with your issue. The error message could also be because of a mismatch between the version of undetected_chrome
and the version of google-chrome
installed on your machine.
There are some discussions on the undetected_chrome
site about selenium connection issues:
You might find that hard-coding the version (driver = uc.Chrome( version_main = 110 )
) helps as a temporary fix.
I am using ssh to set this up on a linux virtual server. The guides on docs.docker cover how to change the memory limit on a GUI that I don't have access to. I tried a bunch of the suggestions from the uc discussion but nothing worked. I also tried hardcoding the version. I added some debug output to see what version was being detected and it was correct (112). I guess I should post my own discussion on the uc project maybe.
I was able to resolve the page crash issue using following args:
driver_arguments:
- "--headless"
- "--disable-dev-shm-usage"
@kevincali Thanks, the - "--disable-dev-shm-usage" parameter solved this issue for me.
Had the same problem. These driver_arguments
seem to have solved it for me.
To clarify this for other readers. I set the following in the config.yaml
:
captcha:
driver_arguments:
- "--headless"
- "--disable-dev-shm-usage"
I found this confusing, but the arguments are not only used when solving captchas, but for all chrome instances in general.
When I try to run this as a service I get the following output:
I am also having a different error when trying to run it in docker (not docker-compose).
I am running on a virtual server on strato.de running linux (ubuntu) Any advice is appreciated