christophebe / serp

Google Search SERP Scraper
101 stars 23 forks source link

proxy not working #7

Closed gsiradze closed 4 years ago

gsiradze commented 4 years ago

I've created a sample app

const options = {
    qs: {
      q: "silicon+valley",
      filter: 0,
      pws: 0
    },
    num: 100,
    proxy: "http://username:password@ip:port"
  };

  const links = await serp.search(options);

without proxy it worked for some minutes, but after I've got an error:

UnhandledPromiseRejectionWarning: StatusCodeError: 429 - "
<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n
<html>\n

<head>
    <meta http-equiv=\ "content-type\" content=\ "text/html; charset=utf-8\">
    <meta name=\ "viewport\" content=\ "initial-scale=1\">
    <title>https://www.google.com/search?q=silicon%20valley&amp;filter=0&amp;pws=0</title>
</head>\n

<body style=\ "font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;\" onload=\ "e=document.getElementById('captcha');if(e){e.focus();}\">\n
    <div style=\ "max-width:400px;\">\n
        <hr noshade size=\ "1\" style=\ "color:#ccc; background-color:#ccc;\">
        <br>\n
        <form id=\ "captcha-form\" action=\ "index\" method=\ "post\">\n
            <script src=\ "https://www.google.com/recaptcha/api.js\" async defer></script>\n
            <script>
                var submitCallback = function(response) {
                    document.getElementById('captcha-form').submit();
                };
            </script>\n
            <div id=\ "recaptcha\" class=\ "g-recaptcha\" data-sitekey=\ "6LfwuyUTAAAAAOAmoS0fdqijC2PbbdH4kjq62Y1b\" data-callback=\ "submitCallback\" data-s=\ "RzBrezqy7Ocruy9AYYozK6BSB1mY3RCdv2dAWoem_xSFyKZqVwEJA8TWx-AedRQ5DWshAEpDf6v2b5Als9D-fC0MnE4rzOUq-mhiJm3yHLCqVgioWZPUSianWs7MLGX45BMm0WFmwBxtvMysrCEHlMVX1QX-Aju5C3qgWfHRbm4s9KovQljUG0QySUFMDsCLVaM6kFcqi7MQECgPSBKxZ6Za4AKqlHdnmkbVvr45N-nEGOpvt_YB4Hs\"></div>\n
            <input type='hidden' name='q' value='EgTEEFYzGLiQsO4FIhkA8aeDSy9YnDQaO0Qz94XxC9gfOeK6Q9VwMgFy'>
            <input type=\ "hidden\" name=\ "continue\" value=\ "https://www.google.com/search?q=silicon%20valley&amp;filter=0&amp;pws=0\">\n</form>\n
        <hr noshade size=\ "1\" style=\ "color:#ccc; background-color:#ccc;\">\n\n
        <div style=\ "font-size:13px;\">\n<b>About this page</b>
            <br>
            <br>\n\nOur systems have detected unusual traffic from your computer network. This page checks to see if it&#39;s really you sending the requests, and not a robot. <a href=\ "#\" onclick=\ "document.getElementById('infoDiv').style.display='block';\">Why did this happen?</a>
            <br>
            <br>\n\n
            <div id=\ "infoDiv\" style=\ "display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;\">\nThis page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href=\ "//www.google.com/policies/terms/\">Terms of Service</a>. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.
                <br>
                <br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. If you share your network connection, ask your administrator for help &mdash; a different computer using the same IP address may be responsible. <a href=\ "//support.google.com/websearch/answer/86640\">Learn more</a>
                <br>
                <br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.\n</div>\n\nIP address: 196.xx.xx.xx
            <br>Time: 2019-11-13T13:42:18Z
            <br>URL: https://www.google.com/search?q=silicon%20valley&amp;filter=0&amp;pws=0
            <br>\n</div>\n</div>\n</body>\n

</html>\n"

so according to docs I added this line: proxy: "http://username:password@ip:port"

But still getting the same error.

P.S. I've tested my proxy and it works

christophebe commented 4 years ago

Sorry for the delay of my response

If you use the same proxy multiple times, it will be blacklisted by Google quickly. The best is to use a list of proxies or increase the delay between each request.

In scenarios where I want to go fast without being blacklisted, I use 100 proxies.

Soon, I will add support for a scrape API.