flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖
GNU Affero General Public License v3.0
834 stars 179 forks source link

Error 405 on Immobilienscout24 #91

Closed jossiamsee closed 3 years ago

jossiamsee commented 3 years ago

Hi there! In advance, thanks for the great tool, which runs perfectly fine despite the error. for immoscout24 I get a 405 error every time (see below), but the flathunter continues to run without problems. currently it runs on a cloud-server from hetzner. When I test it at home with the identical settings, I get no error message. any idea to solve the http-error?

[2020/10/30 07:32:32|abstract_crawler.py|ERROR ]: Got response (405): b'<!DOCTYPE html>\n<html>\n<head>\n <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>\n <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1" />\n <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n <meta name="robots" content="noindex, nofollow">\n <meta http-equiv="cache-control" content="no-cache, no-store, must-revalidate">\n <meta http-equiv="pragma" content="no-cache">\n <meta http-equiv="expires" content="0">\n <title>Ich bin kein Roboter - ImmobilienScout24</title>\n <link rel="icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico"/>\n <link rel="shortcut icon" type="image/vnd.microsoft.icon" href="https://www.immobilienscout24.de/favicon.ico"/>\n <style>\n @font-face {\n font-family: "Make It Sans IS24 Web";\n font-style: normal;\n font-weight: 400;\n font-display: swap;\n src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Regular.woff") format("woff");\n }\n @font-face {\n font-family: "Make It Sans IS24 Web";\n font-style: normal;\n font-weight: 700;\n font-display: swap;\n src: url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff2") format("woff2"), url("https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/make-it-sans/MakeItSansIS24WEB-Bold.woff") format("woff");\n }\n\n @font-face {\n font-family: \'IS24Icons\';\n src: url(\'https://www.static-immobilienscout24.de/fro/core/4.4.1/font/vendor/is24-icons/is24-icons.woff\') format(\'woff\');\n font-weight: normal;\n font-style: normal;\n }\n\n a, abbr, address, article, aside, audio, b, blockquote, body, canvas, caption, cite, code, dd, del, details, dfn, div, dl, dt, em, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, html, i, iframe, img, input, ins, kbd, label, legend, li, main, mark, menu, nav, object, ol, p, pre, q, samp, section, select, small, span, strong, sub, summary, sup, table, tbody, td, textarea, tfoot, th, thead, time, tr, ul, var, video {\n -ms-box-sizing: border-box;\n -o-box-sizing: border-box;\n box-sizing: border-box;\n margin: 0;\n padding: 0;\n border: 0;\n outline: 0;\n }\n\n html {\n font-size: 62.5%;\n }\n body {\n background-color: #fff;\n color: #333;\n font-size: 1.4em;\n line-height: 1.61;\n font-family: "Make It Sans IS24 Web",Verdana,"DejaVu Sans",Arial,Helvetica,sans-serif;\n }\n .page-wrapper {\n margin-left: auto;\n margin-right: auto;\n max-width: 1170px;\n background-color: #fff;\n }\n .grid {\n display: block;\n margin-right: 0;\n }\n .grid:after {\n display: table;\n clear: both;\n content: "";\n }\n .grid-item {\n display: block;\n float: left;\n vertical-align: top;\n text-align: left;\n }\n .header {\n border-bottom: 1px solid #e0e0e0;\n }\n .header .grid {\n padding-left: 70px;\n padding-right: 70px;\n padding-top: 14px;\n padding-bottom: 14px;\n }\n .header .logo {\n width: 50%;\n float: left;\n }\n .header .logo img {\n vertical-align: top;\n }\n .header .login-button {\n width: 50%;\n text-align: right;\n float: left;\n }\n .header .login-button a {\n padding-top: .35714286em;\n padding-bottom: .35714286em;\n min-width: 9.42857143em;\n font-family: "Make It Sans IS24 Web",Verdana,"DejaVu Sans",Arial,Helvetica,sans-serif;\n border-radius: 8px;\n background-color: #fff;\n display: inline-block;\n border: 1px solid #333333;\n padding: .64285714em 1.64285714em;\n font-weight: 600;\n font-size: 1.4rem;\n text-align: center;\n letter-spacing: .2px;\n line-height: 1.42857143em;\n white-space: nowrap;\n cursor: pointer;\n color: #333333;\n }\n .header .login-button a:link, .header .login-button a:visited, .header .login-button a:focus, .header .login-button a:hover {\n text-decoration: none;\n color: #333333;\n }\n .header .login-button a:hover {\n background-color: #eaeaea;\n }\n .main {\n clear: both;\n padding-top: 55px;\n max-width: 583px;\n margin-left: auto;\n margin-right: auto;\n text-align: center;\n }\n .main .headline {\n font-size: 4.0rem;\n font-weight: bold;\n letter-spacing: 0px;\n line-height: 4.8rem;\n text-align: center;\n }\n .main .main__logo {\n padding-top: 10px;\n text-align: center;\n }\n .main .main__logo img {\n height: 240px;\n width: 240px;\n vertical-align: top;\n }\n .main .main__part1 {\n padding-top: 11px;\n font-size: 1.4rem;\n font-weight: bold;\n letter-spacing: 0px;\n line-height: 20px;\n }\n .main .main__captcha {\n padding-top: 36px;\n padding-bottom: 36px;\n }\n .main .main_part2_header1 {\n font-weight: bold;\n }\n .main .main_part2_header2 {\n font-weight: bold;\n padding-top: 16px;\n }\n .main .main__list {\n padding-top: 14px;\n padding-bottom: 42px;\n }\n .main .main__list ul li {\n list-style-position: inside;\n }\n .footer {\n background: #f2f2f2;\n text-align: center;\n }\n .footer .footer-content {\n max-width: 583px;\n margin-left: auto;\n margin-right: auto;\n padding-top: 15px;\n padding-bottom: 6px;\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n }\n .footer .footer-content div {\n padding-top: 20px;\n }\n .footer .footer-content div:first-child {\n padding-top: 0;\n }\n .footer .footer-content a, .footer .footer-content a:visited, .footer .footer-content a:link, .footer .footer-content a:focus, .footer .footer-content .legend {\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n text-decoration: none;\n }\n .footer .footer-content a:hover {\n color: #757575;\n font-size: 1.2rem;\n line-height: 1.6rem;\n text-decoration: underline;\n }\n\n .g-recaptcha {\n display: inline-block;\n }\n\n @media (max-width: 668px) {\n .palm-hide {\n display: none;\n }\n .header .grid {\n padding-left: 16px;\n padding-right: 16px;\n padding-top: 8px;\n padding-bottom: 8px;\n }\n .main {\n padding-top: 32px;\n padding-left: 16px;\n padding-right: 16px;\n }\n .main .headline {\n font-size: 3.2rem;\n font-weight: normal;\n line-height: 4.0rem;\n }\n .main .main__logo img {\n height: 188px;\n width: 188px;\n }\n .footer .footer-content {\n padding-bottom: 32px;\n }\n\n }\n </style>\n\n <script>\n function showBlockPage() {\n console.log("showing block page");\n }\n setTimeout(showBlockPage, 10000);\n </script>\n <script type="text/javascript" src="/assets/immo-1-17" async defer></script>\n <script>window.captchaDescription = \'<p>Nachdem du das unten stehende CAPTCHA best\xc3\xa4tigt hast, wirst du sofort auf die von dir angefragte Seite weitergeleitet.</p>\';</script>\n <script src=\'https://www.google.com/recaptcha/api.js?hl=de\'></script>\n \n <script src="https://www.google.com/recaptcha/api.js" async defer></script>\n <script>\n function solvedCaptcha(payload) {\n const timeoutMs = 10000;\n protectionSubmitCaptcha("recaptcha", payload, timeoutMs, "3:KgR7QA9Zb+DPvlNK5NS0rQ==:Qc1ZWjV3jT+q6LyOv1htA/nmUoIWkcqqc41XxsIy6OWxHPb2t8XycRcMDV/0FGR3ax4IVPrl5qRmqm2RA8aHIuRNhZL1E6PJAkbg5IFVVBbtYVxxo59nosGtEY01RrnSuhs5hD0STKKPbDzntLLh60R0W7+6AzIUSQFKehVnUHiERpphMCXrg74Hg6N6sY75I4ZtEHJEhBRgO36V5uCHOQ==:q3Gl4XIOmWNJ6zYAqLlwwZDHJSgNwu0MGvGtik7zNvo=").then(function() {\n window.location.reload(true);\n });\n }\n </script>\n \n</head>\n<body>\n\n<div class="header">\n <div class="page-wrapper">\n <div class="grid">\n <div class="logo grid-item">\n <a href="https://www.immobilienscout24.de/">\n <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/is24-logo.svg" alt="ImmoScout24 Logo">\n </a>\n </div>\n <div class="login-button grid-item">\n <a href="https://www.immobilienscout24.de/geschlossenerbereich/start.html?source=meinkontodropdown-login">\n Anmelden <span class="palm-hide">/ Registrieren</span>\n </a>\n </div>\n </div>\n </div>\n</div>\n\n<div class="page-wrapper">\n\n<div class="main">\n <div class="headline">\n Ich bin kein Roboter\n </div>\n <div class="main__logo">\n <img src="https://www.static-immobilienscout24.de/fro/imperva/0.0.1/robot-logo.svg" alt="Roboter Logo">\n </div>\n<div class="main__part1">\n Du bist ein Mensch aus Fleisch und Blut? Entschuldige bitte, dann hat unser System dich f\xc3\xa4lschlicherweise als Roboter identifiziert. Um unsere Services weiterhin zu nutzen, l\xc3\xb6se bitte diesen kurzen Test.\n</div>\n\n <div class="main__captcha">\n \n <div class="container">\n \n <script>\n showBlockPage()\n document.writeln(window.captchaDescription || "<p>After completing the CAPTCHA below, you will immediately regain access to the site again.</p>");\n </script>\n <div class="g-recaptcha" data-sitekey="6LeaILIZAAAAALTgLZV1AQXPc2dAsLItNYJ8jVvB" data-callback="solvedCaptcha"></div>\n </div>\n </div>\n\n<div class="main__part2">\n\n <div class="main_part2_header1">Warum f\xc3\xbchren wir diese Sicherheitsma\xc3\x9fnahme durch?</div>\n<div class="main_part2_text1">Mit der Captcha-Methode stellen wir fest, dass du kein Roboter oder eine sch\xc3\xa4dliche Spam-Software bist. Damit sch\xc3\xbctzen wir unsere Webseite und die Daten unserer Nutzerinnen und Nutzer vor betr\xc3\xbcgerischen Aktivit\xc3\xa4ten.</div>\n\n <div class="main_part2_header2">Warum haben wir deine Suchanfragen blockiert?</div>\n <div class="main_part2_text2">Es kann verschiedene Gr\xc3\xbcnde haben, warum wir dich f\xc3\xa4lschlicherweise als Roboter identifiziert haben. M\xc3\xb6glicherweise</div>\n\n</div>\n<div class="main__list">\n<ul>\n <li>hast du die Cookies f\xc3\xbcr unsere Seite deaktiviert.</li>\n <li>hast du die Ausf\xc3\xbchrung von JavaScript deaktiviert.</li>\n <li>nutzt du ein Browser-Plugin eines Drittanbieters, beispielsweise einen Ad-Blocker.</li>\n<li>hast du in kurzer Zeit mehr Anfragen an unser System gestellt, als es \xc3\xbcblicherweise der Fall ist.</li>\n</ul>\n</div>\n\n\n</div>\n\n</div>\n\n<div class="footer">\n <div class="footer-content">\n\n\n <div>\n <a href="https://www.immobilienscout24.de/unternehmen.html">\xc3\x9cber uns</a> |\n <a href="https://www.immobilienscout24.de/kontakt.html">Kontakt & Hilfe</a> |\n <a href="https://www.immobilienscout24.de/unternehmen/karriere/">Karriere</a> |\n <a href="https://www.immobilienscout24.de/sitemap.html">Sitemap</a> |\n <a href="https://api.immobilienscout24.de">Developer</a> |\n <a href="https://www.immobilienscout24.de/unternehmen/mediendienst.html">Presseservice</a> |\n <a href="https://www.immobilienscout24.de/ratgeber/newsletter.html">Newsletter abonnieren</a> |\n <a href="https://www.immobilienscout24.de/impressum.html">Impressum</a> |\n <a href="https://www.immobilienscout24.de/agb.html">AGB\'s & Rechtliche Hinweise</a> |\n <a href="https://www.immobilienscout24.de/agb/verbraucherinformationen.html">Verbraucherinformationen</a> |\n <a href="https://www.immobilienscout24.de/agb/datenschutz.html">Datenschutz</a> |\n <a href="https://www.immobilienscout24.de/lp/Geodatenkodex.html">Datenschutz Kodex f\xc3\xbcr Geodatendienste</a> |\n <a href="https://sicherheit.immobilienscout24.de">Sicherheit</a>\n </div>\n <div>\n <!--<a href="">Immobiliensuche</a> | -->\n <a href="https://www.scout24media.com/">Werbung</a> |\n <a href="https://blog.immobilienscout24.de">Blog</a>\n <!--|\n <a href="">Nachbarschaft</a> |\n <a href="">Gratis! E-Mail-Adresse @t-online.de</a>-->\n </div>\n <div>\n <a href="https://www.immobilienscout24.de/">www.ImmobilienScout24.de</a>\n </div>\n <div class="legend">\n \xc2\xa9 Copyright 1999 - 2020 Immobilien Scout GmbH\n </div>\n </div>\n\n</div>\n\n</body>\n</html>\n'

niphiwi commented 3 years ago

I occasionally get the same error. Do you have the 2captcha implementation running? However, I always assumed that the script crashes after this error.

jossiamsee commented 3 years ago

2captcha is running without problems. what i don't understand: on my local wsl i never get the error, but on the server i do with every crawl. i.e. the server must send something that irritates the immoscout-website to think it is a robot?

/e: the script continues to run uninterrupted.

mordax7 commented 3 years ago

@jossiamsee I believe they do a lot of user session tracking, more precise, they track the public IP you use to access their servers. It could happen that if you try to access their page from your home computer, which has the same gateway as your local server where you were running it previously, the trigger on their servers does not get activated as fast. Because on your Hetzner server you access their page only via the crawler. And the script always tries to crawl without javascript to see if the page can be accessible without the help of Selenium.

On the long run, we want to purely migrate to Selenium, but as a quick fix, we could ignore the 405 from Immobilienscout, because it is not an error that beaks things.

jossiamsee commented 3 years ago

thanks for your explanation! i'll close this one!