flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
834 stars 179 forks source link

Immoscout get HTTP 405 #119

Closed choeffer closed 2 years ago

choeffer commented 3 years ago

I got several errors like this, even when enabling "100% Recognition" at 2captcha. Any ideas?

[2021/04/20 22:41:28|abstract_crawler.py|ERROR   ]: Got response (405): b'<!DOCTYPE html>\n<html>\n\n<head>\n    <script>\n        (function () {\n            try {\n                if (typeof sessionStorage !== \'undefined\') {\n                    sessionStorage.setItem(\'distil_referrer\', document.referrer); \n                }\n            } catch (e) {}\n        })()\n    </script>\n    <meta http-equiv="content-type" content="text/html; charset=UTF-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1" />\n    <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n    <meta name="robots" content="noindex, nofollow">\n    <meta http-equiv="cache-control" content="no-cache, no-store, must-revalidate">\n    <meta http-equiv="pragma" content="no-cache">\n    <meta http-equiv="expires" content="0">\n    <title>Ich bin kein Roboter - ImmobilienScout24</title>\n    <link rel="icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico" />\n    <link rel="shortcut icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico" />\n    <style>\n        @font-face {\n            font-family: "Make It Sans IS24 Web";\n            font-style: normal;\n            font-weight: 400;\n            font-display: swap;\n            src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff") format("woff");\n        }\n\n        @font-face {\n            font-family: "Make It Sans IS24 Web";\n            font-style: normal;\n            font-weight: 700;\n            font-display: swap;\n            src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff") format("woff");\n        }\n\n        @font-face {\n            font-family: \'IS24Icons\';\n            src: url(\'https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/is24-icons/is24-icons.woff\') format(\'woff\');\n            font-weight: normal;\n            font-style: normal;\n        }\n\n        a,\n        abbr,\n        address,\n        article,\n        aside,\n        audio,\n        b,\n        blockquote,\n        body,\n        canvas,\n        caption,\n        cite,\n        code,\n        dd,\n        del,\n        details,\n        dfn,\n        div,\n        dl,\n        dt,\n        em,\n        fieldset,\n        figcaption,\n        figure,\n        footer,\n        form,\n        h1,\n        h2,\n        h3,\n        h4,\n        h5,\n        h6,\n        header,\n        html,\n        i,\n        iframe,\n        img,\n        input,\n        ins,\n        kbd,\n        label,\n        legend,\n        li,\n        main,\n        mark,\n        menu,\n        nav,\n        object,\n        ol,\n        p,\n        pre,\n        q,\n        samp,\n        section,\n        select,\n        small,\n        span,\n        strong,\n        sub,\n        summary,\n        sup,\n        table,\n        tbody,\n        td,\n        textarea,\n        tfoot,\n        th,\n        thead,\n        time,\n        tr,\n        ul,\n        var,\n        video {\n            -ms-box-sizing: border-box;\n            -o-box-sizing: border-box;\n            box-sizing: border-box;\n            margin: 0;\n            padding: 0;\n            border: 0;\n            outline: 0;\n        }\n\n        html {\n            font-size: 62.5%;\n        }\n\n        body {\n            background-color: #fff;\n            color: #333;\n            font-size: 1.4em;\n            line-height: 1.61;\n            font-family: "Make It Sans IS24 Web", Verdana, "DejaVu Sans", Arial, Helvetica, sans-serif;\n        }\n\n        .page-wrapper {\n            margin-left: auto;\n            margin-right: auto;\n            max-width: 1170px;\n            background-color: #fff;\n        }\n\n        .grid {\n            display: block;\n            margin-right: 0;\n        }\n\n        .grid:after {\n            display: table;\n            clear: both;\n            content: "";\n        }\n\n        .grid-item {\n            display: block;\n            float: left;\n            vertical-align: top;\n            text-align: left;\n        }\n\n        .header {\n            border-bottom: 1px solid #e0e0e0;\n        }\n\n        .header .grid {\n            padding-left: 70px;\n            padding-right: 70px;\n            padding-top: 14px;\n            padding-bottom: 14px;\n        }\n\n        .header .logo {\n            width: 50%;\n            float: left;\n        }\n\n        .header .logo img {\n            vertical-align: top;\n        }\n\n        .header .login-button {\n            width: 50%;\n            text-align: right;\n            float: left;\n        }\n\n        .header .login-button a {\n            padding-top: .35714286em;\n            padding-bottom: .35714286em;\n            min-width: 9.42857143em;\n            font-family: "Make It Sans IS24 Web", Verdana, "DejaVu Sans", Arial, Helvetica, sans-serif;\n            border-radius: 8px;\n            background-color: #fff;\n            display: inline-block;\n            border: 1px solid #333333;\n            padding: .64285714em 1.64285714em;\n            font-weight: 600;\n            font-size: 1.4rem;\n            text-align: center;\n            letter-spacing: .2px;\n            line-height: 1.42857143em;\n            white-space: nowrap;\n            cursor: pointer;\n            color: #333333;\n        }\n\n        .header .login-button a:link,\n        .header .login-button a:visited,\n        .header .login-button a:focus,\n        .header .login-button a:hover {\n            text-decoration: none;\n            color: #333333;\n        }\n\n        .header .login-button a:hover {\n            background-color: #eaeaea;\n        }\n\n        .main {\n            clear: both;\n            padding-top: 55px;\n            max-width: 583px;\n            margin-left: auto;\n            margin-right: auto;\n            text-align: center;\n        }\n\n        .main .headline {\n            font-size: 4.0rem;\n            font-weight: bold;\n            letter-spacing: 0px;\n            line-height: 4.8rem;\n            text-align: center;\n        }\n\n        .main .main__logo {\n            padding-top: 10px;\n            text-align: center;\n        }\n\n        .main .main__logo img {\n            height: 240px;\n            width: 240px;\n            vertical-align: top;\n        }\n\n        .main .main__part1 {\n            padding-top: 11px;\n            font-size: 1.4rem;\n            font-weight: bold;\n            letter-spacing: 0px;\n            line-height: 20px;\n        }\n\n        .main .main__captcha {\n            padding-top: 36px;\n            padding-bottom: 36px;\n        }\n\n        .main .main_part2_header1 {\n            font-weight: bold;\n        }\n\n        .main .main_part2_header2 {\n            font-weight: bold;\n            padding-top: 16px;\n        }\n\n        .main .main__list {\n            padding-top: 14px;\n            padding-bottom: 42px;\n        }\n\n        .main .main__list ul li {\n            list-style-position: inside;\n        }\n\n        .footer {\n            background: #f2f2f2;\n            text-align: center;\n        }\n\n        .footer .footer-content {\n            max-width: 583px;\n            margin-left: auto;\n            margin-right: auto;\n            padding-top: 15px;\n            padding-bottom: 6px;\n            color: #757575;\n            font-size: 1.2rem;\n            line-height: 1.6rem;\n        }\n\n        .footer .footer-content div {\n            padding-top: 20px;\n        }\n\n        .footer .footer-content div:first-child {\n            padding-top: 0;\n        }\n\n        .footer .footer-content a,\n        .footer .footer-content a:visited,\n        .footer .footer-content a:link,\n        .footer .footer-content a:focus,\n        .footer .footer-content .legend {\n            color: #757575;\n            font-size: 1.2rem;\n            line-height: 1.6rem;\n            text-decoration: none;\n        }\n\n        .footer .footer-content a:hover {\n            color: #757575;\n            font-size: 1.2rem;\n            line-height: 1.6rem;\n            text-decoration: underline;\n        }\n\n        .g-recaptcha {\n            display: inline-block;\n        }\n        \n        .geetest_holder {\n            margin: 0 auto;\n        }\n\n        @media (max-width: 668px) {\n            .palm-hide {\n                display: none;\n            }\n\n            .header .grid {\n                padding-left: 16px;\n                padding-right: 16px;\n                padding-top: 8px;\n                padding-bottom: 8px;\n            }\n\n            .main {\n                padding-top: 32px;\n                padding-left: 16px;\n                padding-right: 16px;\n            }\n\n            .main .headline {\n                font-size: 3.2rem;\n                font-weight: normal;\n                line-height: 4.0rem;\n            }\n\n            .main .main__logo img {\n                height: 188px;\n                width: 188px;\n            }\n\n            .footer .footer-content {\n                padding-bottom: 32px;\n            }\n\n        }\n    </style>\n\n    <script>\n        function showBlockPage() {\n            console.log("showing block page");\n        }\n        setTimeout(showBlockPage, 10000);\n    </script>\n    <script type="text/javascript" src="/assets/immo-1-17" async defer></script>\n    \n    <script>\n    window.captchaDescription = \'<p>Nachdem du das unten stehende CAPTCHA best\xc3\xa4tigt hast, wirst du sofort auf die von dir angefragte Seite weitergeleitet.</p>\';\n    window.geetestLang = \'de\';\n    </script>\n    \n    <script src=\'https://www.google.com/recaptcha/api.js?hl=de\'></script>\n    \n                    <script src="https://static.geetest.com/static/tools/gt.js"></script>\n                       <script>\n                          initGeetest({\n                            gt: "0fdbade8a0fe41cba0ff758456d23dfa",\n                            challenge: "8ceb4d705b1888572186821a13f88a1e",\n                            offline: false,\n                            new_captcha: true,\n                            lang: window.geetestLang || "en",\n                          }, function (captchaObj) {\n                            captchaObj.onSuccess(function () {\n                                var obj = captchaObj.getValidate();\n                                solvedCaptcha({\n                                    geetest_challenge: obj.geetest_challenge,\n                                    geetest_seccode: obj.geetest_seccode,\n                                    geetest_validate: obj.geetest_validate,\n                                    data: "3:jLlXJG3MOjTMyjWA1ZYXvA==:FhtcP9zcFCS+qL3P2GLawQzwnMmAKpetjy4tsCzNHd5V3l1qtQyG+5MqG8c2S54Q:SDN/BDvW0xDi1k/WZFZEObyBH0kUStOM6NC2jzo3uXg="\n                                });\n                            });\n                            captchaObj.appendTo(\'#captcha-box\');\n                          });\n                       </script>\n                    \n                    <script>\n                        function solvedCaptcha(payload) {\n                            const timeoutMs = 10000;\n                            protectionSubmitCaptcha("geetest", payload, timeoutMs, "3:zPx5s4tjWiqDMbvY6BbRwQ==:TWXQlUP5uV4ajSwdUVE+Kh3fC/392zbuZ1tfX/u5ugegl9dGOzeEy5Pyf5SMPX5M2luacrjXyHANvljEFpH3hSEHeIB31m1jbQygumf8/yFWjGYuLwDMG96mIXPeSP1Q0xzsQuti4M4FFBQpLniCZIyTppZ6jshKN8sKrrVKSOfNBEO3rcqevFLf3MGqdlmf:703aLNLcyoi3tzTkMgGfreQTnRyzS6Ouog6WE8nqckQ=")\n                                .then(\n                                    function() {\n                                        window.location.reload(true);\n                                    },\n                                    function(error) {\n                                        console.log(error);\n                                    },\n                                );\n                        }\n                    </script>\n                \n</head>\n\n<body>\n\n    <div class="header">\n        <div class="page-wrapper">\n            <div class="grid">\n                <div class="logo grid-item">\n                    <a href="https://www.immobilienscout24.de/">\n                        <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/is24-logo.svg"\n                            alt="ImmoScout24 Logo">\n                    </a>\n                </div>\n                <div class="login-button grid-item">\n                    <a\n                        href="https://www.immobilienscout24.de/geschlossenerbereich/start.html?source=meinkontodropdown-login">\n                        Anmelden <span class="palm-hide">/ Registrieren</span>\n                    </a>\n                </div>\n            </div>\n        </div>\n    </div>\n\n    <div class="page-wrapper">\n\n        <div class="main">\n            <div class="headline">\n                \n                \n                Ich bin kein Roboter\n            </div>\n            <div class="main__logo">\n                <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/robot-logo.svg" alt="Roboter Logo">\n            </div>\n            <div class="main__part1">\n                \n                \n                Du bist ein Mensch aus Fleisch und Blut? Entschuldige bitte, dann hat unser System dich\n                f\xc3\xa4lschlicherweise als Roboter identifiziert. Um unsere Services weiterhin zu nutzen, l\xc3\xb6se bitte diesen\n                kurzen Test.\n            </div>\n\n            <div class="main__captcha">\n                \n                <div id="explanation" class="container">\n                    \n                    <script>\n                    showBlockPage()\n                    document.writeln(window.captchaDescription || "<p>After completing the CAPTCHA below, you will immediately regain access to the site again.</p>");\n                    </script>\n                <div id="captcha-box"></div>\n                </div>\n            </div>\n\n            <script type="text/javascript" charset="UTF-8">\n                const translatedStrings = {\n                    toRegainAccess: {\n                        EN: "To regain access, please make sure that cookies and JavaScript are enabled before reloading the page",\n                        DE: "Um wieder Zugriff zu erhalten, stelle bitte sicher, dass Cookies und JavaScript aktiviert sind, bevor du die Seite neu l\xc3\xa4dst",\n                    },\n                };\n\n                function translateDoc(language, text) {\n                    let replacement = text;\n\n                    Object.entries(translatedStrings).forEach(([key, value]) => {\n                        // Checks English string is present and a translation for the selected language exists before attempting to replace\n                        if (value.EN && value[language]) {\n                            replacement = replacement.replace(value.EN, value[language]);\n                        }\n                    });\n\n                    return replacement;\n                }\n\n                document.addEventListener("DOMContentLoaded", function () {\n                    const impervaContent = document.getElementById("explanation")\n                        .outerHTML;\n\n                    const translatedContent = translateDoc("DE", impervaContent);\n\n                    document.body.innerHTML = document.body.innerHTML.replace(\n                        impervaContent,\n                        translatedContent\n                    );\n                });\n            </script>\n\n            <div class="main__part2">\n                <div class="main_part2_header1">Warum f\xc3\xbchren wir diese Sicherheitsma\xc3\x9fnahme durch?</div>\n                <div class="main_part2_text1">Mit der Captcha-Methode stellen wir fest, dass du kein\n                    Roboter oder eine sch\xc3\xa4dliche Spam-Software bist. Damit sch\xc3\xbctzen wir unsere Webseite und die Daten\n                    unserer Nutzerinnen und Nutzer vor betr\xc3\xbcgerischen Aktivit\xc3\xa4ten.</div>\n\n                <div class="main_part2_header2">Warum haben wir deine Suchanfragen blockiert?</div>\n                <div class="main_part2_text2">Es kann verschiedene Gr\xc3\xbcnde haben, warum wir dich f\xc3\xa4lschlicherweise als\n                    Roboter identifiziert haben. M\xc3\xb6glicherweise</div>\n\n            </div>\n            <div class="main__list">\n                <ul>\n                    <li>hast du die Cookies f\xc3\xbcr unsere Seite deaktiviert.</li>\n                    <li>hast du die Ausf\xc3\xbchrung von JavaScript deaktiviert.</li>\n                    <li>nutzt du ein Browser-Plugin eines Drittanbieters, beispielsweise einen Ad-Blocker.</li>\n                    <li>hast du in kurzer Zeit mehr Anfragen an unser System gestellt, als es\n                        \xc3\xbcblicherweise der Fall ist.</li>\n                </ul>\n            </div>\n\n\n        </div>\n\n    </div>\n\n    <div class="footer">\n        <div class="footer-content">\n\n\n            <div>\n                <a href="https://www.immobilienscout24.de/unternehmen.html">\xc3\x9cber uns</a> |\n                <a href="https://www.immobilienscout24.de/kontakt.html">Kontakt & Hilfe</a> |\n                <a href="https://www.immobilienscout24.de/unternehmen/karriere/">Karriere</a> |\n                <a href="https://www.immobilienscout24.de/sitemap.html">Sitemap</a> |\n                <a href="https://api.immobilienscout24.de">Developer</a> |\n                <a href="https://www.immobilienscout24.de/unternehmen/mediendienst.html">Presseservice</a> |\n                <a href="https://www.immobilienscout24.de/ratgeber/newsletter.html">Newsletter abonnieren</a> |\n                <a href="https://www.immobilienscout24.de/impressum.html">Impressum</a> |\n                <a href="https://www.immobilienscout24.de/agb.html">AGB\'s & Rechtliche Hinweise</a> |\n                <a\n                    href="https://www.immobilienscout24.de/agb/verbraucherinformationen.html">Verbraucherinformationen</a>\n                |\n                <a href="https://www.immobilienscout24.de/agb/datenschutz.html">Datenschutz</a> |\n                <a href="https://www.immobilienscout24.de/lp/Geodatenkodex.html">Datenschutz Kodex f\xc3\xbcr\n                    Geodatendienste</a> |\n                <a href="https://sicherheit.immobilienscout24.de">Sicherheit</a>\n            </div>\n            <div>\n                <!--<a href="">Immobiliensuche</a> | -->\n                <a href="https://www.scout24media.com/">Werbung</a> |\n                <a href="https://blog.immobilienscout24.de">Blog</a>\n                <!--|\n            <a href="">Nachbarschaft</a> |\n            <a href="">Gratis! E-Mail-Adresse @t-online.de</a>-->\n            </div>\n            <div>\n                <a href="https://www.immobilienscout24.de/">www.ImmobilienScout24.de</a>\n            </div>\n            <div class="legend">\n                \xc2\xa9 Copyright 1999 - 2021 Immobilien Scout GmbH\n            </div>\n        </div>\n\n    </div>\n\n</body>\n\n</html>\n'
Traceback (most recent call last):
  File "/home/choeffer/Dokumente/flathunter/flathunt.py", line 89, in <module>
    main()
  File "/home/choeffer/Dokumente/flathunter/flathunt.py", line 86, in main
    launch_flat_hunt(config)
  File "/home/choeffer/Dokumente/flathunter/flathunt.py", line 46, in launch_flat_hunt
    hunter.hunt_flats()
  File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 42, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 21, in crawl_for_exposes
    return chain(*[searcher.crawl(url, max_pages)
  File "/home/choeffer/Dokumente/flathunter/flathunter/hunter.py", line 21, in <listcomp>
    return chain(*[searcher.crawl(url, max_pages)
  File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 136, in crawl
    return self.get_results(url, max_pages)
  File "/home/choeffer/Dokumente/flathunter/flathunter/crawl_immobilienscout.py", line 60, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/home/choeffer/Dokumente/flathunter/flathunter/crawl_immobilienscout.py", line 120, in get_page
    return self.get_soup_from_url(search_url.format(page_no), driver=driver, captcha_api_key=self.captcha_api_key, checkbox=self.checkbox, afterlogin_string=self.afterlogin_string)
  File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 75, in get_soup_from_url
    self.resolvecaptcha(driver, checkbox, afterlogin_string, captcha_api_key)
  File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 151, in resolvecaptcha
    iframe_present = self._check_if_iframe_visible(driver)
  File "/home/choeffer/Dokumente/flathunter/flathunter/abstract_crawler.py", line 207, in _check_if_iframe_visible
    iframe = WebDriverWait(driver, 10).until(EC.visibility_of_element_located(
  File "/home/choeffer/Dokumente/flathunter/venv/lib64/python3.9/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: