go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.42k stars 356 forks source link

Too many request 429 #1125

Closed 6thfdwp closed 3 weeks ago

6thfdwp commented 1 month ago

First thanks for the tool, it's easy to get started. Appreciate any help in advance.

Rod Version: v0.116.2

The code to demonstrate your question

      var MyDevice = devices.Device{
    Title:          "Chrome computer",
    Capabilities:   []string{"touch", "mobile"},
    UserAgent:      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
    AcceptLanguage: "en",
    Screen: devices.Screen{
        DevicePixelRatio: 2,
        Horizontal: devices.ScreenSize{
            Width:  1500,
            Height: 900,
        },
        Vertical: devices.ScreenSize{
            Width:  1500,
            Height: 900,
        },
    },
}
    u := launcher.New().Headless(false).
        Set("--disable-blink-features", "AutomationControlled").
        Set("--user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36").
        Set("no-first-run").
        Set("disable-default-apps").
        MustLaunch()
    browser := rod.New().DefaultDevice(MyDevice).ControlURL(u).MustConnect()

    page := stealth.MustPage(browser)
        page.MustNavigate(url).MustWaitLoad()
       page.MustScreenshotFullPage("./image.png")

What you got

Empty screenshot. Chromium is launched, but when navigating the url, it got the 429 error.

image

What you expect to see

Not sure if anyone encountered this issue? Suspect some anti-bot blocked the request. Any workaround we can do?

What have you tried to solve the question

Tried using Stealth, and some suggestion from another issue #922 https://github.com/go-rod/rod/issues/922#issuecomment-1689120360

github-actions[bot] commented 1 month ago

Please fix the format of your markdown:

41:1 MD033/no-inline-html Inline HTML [Element: img]
50:65 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1]

generated by check-issue

ysmood commented 1 month ago

The user has sent too many requests in a given amount of time. Intended for use with rate-limiting schemes.[24]

Usually it's caused by your gateway or proxy.

6thfdwp commented 1 month ago

Thanks for the link. Understand it's caused by targeted site's rate limiting. @ysmood

But just to understand a bit better. As even I directly type url inside Chromium, (just like how a normal user is visiting the site), it still gives 429. Seems their limiting is to check if site is visited with Chromium not a normal browser? As using Chrome to type url to visit is perfectly fine.

6thfdwp commented 3 weeks ago

Updates: if we launch browser with user mode (real Chrome to visit the site instead of Chromium), it's working. Seems the rate limiting is purely checking the browser.

  path, _ := launcher.LookPath()
  u := launcher.NewUserMode().Bin(path).MustLaunch()
  browser := rod.New().ControlURL(wsurl).MustConnect()
  ..
  page := browser.MustPage(url).MustWaitLoad()