flathunters / flathunter

A bot to help people with their rental real-estate search. πŸ πŸ€–
GNU Affero General Public License v3.0
845 stars 180 forks source link

Problem with chrome driver on Raspberry Pi #358

Closed mxfilerelatedcache closed 6 months ago

mxfilerelatedcache commented 1 year ago

I'm trying to set up flathunter on my Raspberry Pi 4 running Debian GNU/Linux 11 (bullseye), but get a problem when running flat hunt.py. It seems to be related to the Chromium Driver and #192. I have the newest chromium driver (109.0.5414.112-rpt2). The error looks like this:

simon@simonspi:~/Documents/flathunter $ pipenv run python flathunt.py [2023/04/09 13:42:28|config.py |INFO ]: Using config path /home/simon/Documents/flathunter/config.yaml [2023/04/09 13:42:28|chrome_wrapper.py |INFO ]: Initializing Chrome WebDriver for crawler... [2023/04/09 13:42:30|patcher.py |INFO ]: patching driver executable /home/simon/.local/share/undetected_chromedriver/undetected_chromedriver Traceback (most recent call last): File "/home/simon/Documents/flathunter/flathunt.py", line 118, in <module> main() File "/home/simon/Documents/flathunter/flathunt.py", line 114, in main launch_flat_hunt(config, heartbeat) File "/home/simon/Documents/flathunter/flathunt.py", line 36, in launch_flat_hunt hunter.hunt_flats() File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 56, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 35, in crawl_for_exposes return chain(*[try_crawl(searcher, url, max_pages) File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 35, in <listcomp> return chain(*[try_crawl(searcher, url, max_pages) File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 27, in try_crawl return searcher.crawl(url, max_pages) File "/home/simon/Documents/flathunter/flathunter/abstract_crawler.py", line 150, in crawl return self.get_results(url, max_pages) File "/home/simon/Documents/flathunter/flathunter/crawler/immobilienscout.py", line 90, in get_results soup = self.get_page(search_url, self.get_driver(), page_no) File "/home/simon/Documents/flathunter/flathunter/crawler/immobilienscout.py", line 65, in get_driver self.driver = get_chrome_driver(driver_arguments) File "/home/simon/Documents/flathunter/flathunter/chrome_wrapper.py", line 47, in get_chrome_driver driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/undetected_chromedriver/__init__.py", line 441, in __init__ super(Chrome, self).__init__( File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__ super().__init__( File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 101, in __init__ self.service.start() File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 90, in start self._start_process(self.path) File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 203, in _start_process self.process = subprocess.Popen( File "/usr/lib/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) OSError: [Errno 8] Exec format error: '/home/simon/.local/share/undetected_chromedriver/undetected_chromedriver' [2023/04/09 13:42:30|__init__.py |INFO ]: ensuring close

I tried the steps described by @Ralfons-06 in #192, but it seems the code changed so I'm unsure how to proceed. Any ideas? Thanks!

codders commented 1 year ago

Exec format error would imply you have the wrong architecture for your undetected_chromedriver binary. What is the output of file for the undetected_chromedriver binary, and what does uname -a say?

Arthur

Simon Krukowski @.***> schrieb am So., 9. Apr. 2023, 13:50:

I'm trying to set up flathunter on my Raspberry Pi 4 running Debian GNU/Linux 11 (bullseye), but get a problem when running flat hunt.py. It seems to be related to the Chromium Driver and #192 https://github.com/flathunters/flathunter/issues/192. I have the newest chromium driver (109.0.5414.112-rpt2). The error looks like this:

@.**:~/Documents/flathunter $ pipenv run python flathunt.py [2023/04/09 13:42:28|config.py |INFO ]: Using config path /home/simon/Documents/flathunter/config.yaml [2023/04/09 13:42:28|chrome_wrapper.py |INFO ]: Initializing Chrome WebDriver for crawler... [2023/04/09 13:42:30|patcher.py |INFO ]: patching driver executable /home/simon/.local/share/undetected_chromedriver/undetected_chromedriver Traceback (most recent call last): File "/home/simon/Documents/flathunter/flathunt.py", line 118, in main() File "/home/simon/Documents/flathunter/flathunt.py", line 114, in main launch_flat_hunt(config, heartbeat) File "/home/simon/Documents/flathunter/flathunt.py", line 36, in launch_flat_hunt hunter.hunt_flats() File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 56, in hunt_flats for expose in processor_chain.process(self.crawl_for_exposes(max_pages)): File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 35, in crawl_for_exposes return chain([try_crawl(searcher, url, max_pages) File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 35, in

return chain(*[try_crawl(searcher, url, max_pages) File "/home/simon/Documents/flathunter/flathunter/hunter.py", line 27, in try_crawl return searcher.crawl(url, max_pages) File "/home/simon/Documents/flathunter/flathunter/abstract_crawler.py", line 150, in crawl return self.get_results(url, max_pages) File "/home/simon/Documents/flathunter/flathunter/crawler/immobilienscout.py", line 90, in get_results soup = self.get_page(search_url, self.get_driver(), page_no) File "/home/simon/Documents/flathunter/flathunter/crawler/immobilienscout.py", line 65, in get_driver self.driver = get_chrome_driver(driver_arguments) File "/home/simon/Documents/flathunter/flathunter/chrome_wrapper.py", line 47, in get_chrome_driver driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/undetected_chromedriver/__init__.py", line 441, in __init__ super(Chrome, self).__init__( File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__ super().__init__( File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 101, in __init__ self.service.start() File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 90, in start self._start_process(self.path) File "/home/simon/.local/share/virtualenvs/flathunter-QaHh8Mme/lib/python3.9/site-packages/selenium/webdriver/common/service.py", line 203, in _start_process self.process = subprocess.Popen( File "/usr/lib/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) OSError: [Errno 8] Exec format error: '/home/simon/.local/share/undetected_chromedriver/undetected_chromedriver' [2023/04/09 13:42:30|__init__.py |INFO ]: ensuring close I tried the steps described by @Ralfons-06 in #192 , but it seems the code changed so I'm unsure how to proceed. Any ideas? Thanks! β€” Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
mxfilerelatedcache commented 1 year ago

Hey, thanks for the quick reply. uname -a says Linux simonspi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux. I'm not sure what the output of file for the binary is or how to get it - but I did some research and it seems to be related to the fact that there is no chromedriver for ARM64. The workaround explained here seems promising but it will take a while to wrap my head around this.

navels commented 1 year ago

The latest Pi OS has a 64-bit kernel and a 32-bit OS, which throws off any software installer using uname to determine host architecture. See https://forums.raspberrypi.com/viewtopic.php?t=344246 for some discussion (among other random problems with the 6.1 kernel).

Anyway, I did this to downgrade my kernel to the previous, 32-bit version:

sudo rpi-update bdb151a

While I cannot guarantee that will fix your issue, it is probably a good place to start to get your system in a more predictable state. After a reboot, uname -a should show

Linux pi 5.15.84-v7l+ #1613 SMP Thu Jan 5 12:01:26 GMT 2023 armv7l GNU/Linux

This did fix similar problems I was having getting chromedriver working after I had previously upgraded the kernel to 6.1.

mxfilerelatedcache commented 6 months ago

Better late than never, finally fixed the issue. It had to do with the undetected-chromedriver not being made for ARM64 (which my Pi runs at). I finally fixed the issue by manually changing code in the [chrome_wrapper.py](https://github.com/flathunters/flathunter/blob/main/flathunter/chrome_wrapper.py) file.

I followed the instructions described here to download and patch the unofficial ARM64 undetected-chromedriver from electron, and then manually set this patched driver in the chrome_wrapper.py file, similar (but not exactly) like described in #192. This finally worked, thanks @navels!

If someone comes across the same issue, let me know and I can share the changed code in the chrome_wrapper.py file.

osharaki commented 4 months ago

Hey @mxfilerelatedcache πŸ‘‹ Would be great if you could share the changes you made in chrome_wrapper.py πŸ™‚

heckhoff commented 3 months ago

Sharing your changes would help me out as well, thanks in advance @mxfilerelatedcache :)

xeruun commented 3 months ago

I have the same problem. Would it possible to share the modified file @mxfilerelatedcache? Thanks in advance πŸ™‚

mxfilerelatedcache commented 3 months ago

Hey, sorry for the late response, was a bit down under. I'm at work right now but when I get home I'll try to look for my Pi and the respective chrome_wrapper.py!

valnurat commented 3 months ago

Looking forward to see this. Thank you

mxfilerelatedcache commented 3 months ago

It's been a while so I'm not quite sure about all the places I changed so I'll provide the whole chrome_wrapper.py. I do remember though that I manually added the driver path. Here is the file:

"""Chrome needs some special handling to work out where the correct
binary is, to attach the correct selenium chromedriver, and to set
the correct version number"""
import re
import subprocess
from typing import List
from sys import platform
import undetected_chromedriver as uc
from selenium.webdriver.chrome.service import Service

from flathunter.logging import logger
from flathunter.exceptions import ChromeNotFound

CHROME_VERSION_REGEXP = re.compile(r'.* (\d+\.\d+\.\d+\.\d+)( .*)?')
WINDOWS_CHROME_REG_PATH = r'HKEY_CURRENT_USER\Software\Google\Chrome\BLBeacon'
WINDOWS_CHROME_REG_REGEXP = re.compile(r'\s*version\s*REG_SZ\s*(\d+)\..*')
CHROME_BINARY_NAMES = ['google-chrome', 'chromium', 'chrome', 'chromium-browser',
                       '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome']

def get_command_output(args) -> List[str]:
    """Run a command and return stdout"""
    try:
        with subprocess.Popen(args,
                    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
                    universal_newlines=True) as process:
            if process.stdout is None:
                return []
            return process.stdout.readlines()
    except FileNotFoundError:
        return []

def get_chrome_version() -> int:
    """Determine the correct name for the chrome binary"""
    for binary_name in CHROME_BINARY_NAMES:
        try:
            version_output = get_command_output([binary_name, '--version'])
            if not version_output:
                continue
            match = CHROME_VERSION_REGEXP.match(version_output[0])
            if match is None:
                continue
            return int(match.group(1).split('.')[0])
        except FileNotFoundError:
            pass
    try:
        # on Windows, Chrome doesn't respond to --version, but we can find
        # the version in the registry
        output = get_command_output(
            ['reg', 'query', WINDOWS_CHROME_REG_PATH, '/v', 'version']
        )
        version_matches = (WINDOWS_CHROME_REG_REGEXP.match(l) for l in output)
        version_matches = [m for m in version_matches if m is not None]
        if version_matches:
            return int(version_matches[0].group(1))
    except FileNotFoundError:
        pass
    raise ChromeNotFound()

def get_chrome_driver(driver_arguments):
    """Configure Chrome WebDriver"""
    logger.info('Initializing Chrome WebDriver for crawler...')
    chrome_options = uc.ChromeOptions() # pylint: disable=no-member

    # manually configure browser
    chrome_options.BinaryLocation = "/usr/bin/chromium-browser"
    driver_path = "/home/simon/chromedriver"

    if platform == "darwin":
        chrome_options.add_argument("--headless")
    if driver_arguments is not None:
        for driver_argument in driver_arguments:
            chrome_options.add_argument(driver_argument)
    chrome_version = get_chrome_version()
    chrome_options.add_argument("--headless=new")
    #driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member

    # manually configure chromedriver
    driver = uc.Chrome(
        driver_executable_path=driver_path,
        options=chrome_options,
    version_main=chrome_version)

    driver.execute_cdp_cmd(
        "Network.setUserAgentOverride",
        {
            "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
                         "AppleWebKit/537.36 (KHTML, like Gecko)"
                         "Chrome/120.0.0.0 Safari/537.36"
        },
    )

    driver.execute_cdp_cmd('Network.setBlockedURLs',
        {"urls": ["https://api.geetest.com/get.*"]})
    driver.execute_cdp_cmd('Network.enable', {})
    return driver
valnurat commented 3 months ago

Hi @mxfilerelatedcache

I'm not using the flathunter, but I'm trying to do a scraper of my own. What I already have done works in windows, but I do have issues with UC on my raspberry. I'm running Raspberry OS, but should your solution fix my issue that I have here: https://github.com/ultrafunkamsterdam/undetected-chromedriver/discussions/1925 If so, do you think you could explain from scrath how got yours working? Br

osharaki commented 3 months ago

For anyone still interested, in chrome_wrapper.py, I needed to change this line from

driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member

to

chrome_options.BinaryLocation = "/usr/bin/chromium-browser"
driver = uc.Chrome(driver_executable_path='/usr/bin/chromedriver', options=chrome_options) # pylint: disable=no-member

Of course, make sure that driver_executable_path points to where your patched chromedriver is located. In my case, it's /usr/bin/.