go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.23k stars 343 forks source link

Cannot start more than two instances using goroutines and when headless is off #889

Open NuLL3rr0r opened 1 year ago

NuLL3rr0r commented 1 year ago

Rod Version: v0.113.3

The code to demonstrate your question

package rod_test

import (
    "fmt"
    "sync"
    "testing"
    "time"

    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
    "github.com/go-rod/stealth"
)

func spawnInstances(instances []int, maxThreads int) {
    var channel = make(chan int, maxThreads)

    var wg sync.WaitGroup

    wg.Add(maxThreads)
    for i := 0; i < maxThreads; i++ {
        go func() {
            for {
                instance, bOK := <-channel
                if !bOK {
                    wg.Done()
                    return
                }

                spawnInstance(instance)
            }
        }()
    }

    for _, instance := range instances {
        channel <- instance
    }

    close(channel)
    wg.Wait()
}

func spawnInstance(instance int) {
    fmt.Println(fmt.Sprintf("Running bot #%d...", instance))

    userDataDir := fmt.Sprintf("/tmp/bot-%d", instance)

    launcher := launcher.NewUserMode().
        Bin("/usr/bin/brave-bin").
        Leakless(true).
        UserDataDir(userDataDir).
        WorkingDir("").
        Devtools(false).
        Headless(false).
        Set("disable-default-apps").
        Set("no-first-run")

    controlURL, err := launcher.Launch()
    if err != nil {
        fmt.Println(err)
    }

    browser := rod.New().
        ControlURL(controlURL).
        NoDefaultDevice()
    err = browser.Connect()
    if err != nil {
        fmt.Println(err)
    }

    defer browser.Close()

    page, err := stealth.Page(browser)
    if err != nil {
        fmt.Println(err)
    }

    err = page.Navigate(fmt.Sprintf("https://twitter.com/%d", instance))
    if err != nil {
        fmt.Println(err)
        return
    }

    err = page.WaitLoad()
    if err != nil {
        fmt.Println(err)
        return
    }

    time.Sleep(time.Second * 5)

    fmt.Println(fmt.Sprintf("Shutting down bot #%d...", instance))
}

// This is the template to demonstrate how to test Rod.
func TestRod(t *testing.T) {
    g := setup(t)
    g.cancelTimeout() // Cancel timeout protection

    var instances []int

    for i := 0; i < 100; i++ {
        instances = append(instances, i)
    }

    spawnInstances(instances, 5)
}

const doc = `
<html>
  <body>ok</body>
</html>
`

What you got

Well, as you can see from the example. I try to create 100 profiles inside /tmp with the following format: /tmp/bot-{X}. Then I try to limit the number of Goroutines to 5 at a time. The issue is it works fine with maxThreads set to 1 or 2. But, when I set anything bigger than that I get only numbers 1 and 2 open and some in between are getting skipped. For example, if I start with maxThreads 5, I only see the browser window open for bots 1 and 2 and it never opens 3, 4, or 5. But when 1, and 2 are closed I see the numbers 7,8 appear. I also append the but number to the URL for you to see.

What you expected to see

I expect all windows to open. Also, the logs that confirms the bots are running are printed out to the console. But, no browser Window.

What have you tried to solve the question

I've tried another implementation and thought maybe my implementation of limited goroutines is wrong, but the results are the same.

ysmood commented 1 year ago

Could you remove all the unnecessary code and make the sample code simpler?

Why not just use BrowserPool, I test rod with 8 browsers on local without problem.

https://github.com/go-rod/rod/blob/d9cfe8ea56a61fadc7e9343570fc7fc33fd6b693/browser_test.go#L442-L450

NuLL3rr0r commented 1 year ago

I have never heard of it until you mentioned it and did not know it exists. Thank you! I will try it out.

Sure, I can simplify it into one function, but I thought it might be more readable. Once done I'll let update it.

ysmood commented 1 year ago

It's in the tutorial: https://go-rod.github.io/#/browsers-pages?id=browser-pool

NuLL3rr0r commented 1 year ago

Sorry for my tardy response. Been a bit busy. I guess the problem lies elsewhere. It might be brave.

Even if I run one browser instance from a Go program (let's say program A), and when I run two other programs (program B and program C). Only the browser Window for programs A and B appears. Program C fails with the following error:

ERROR 2023/06/20 11:02:33 [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

ERROR 2023/06/20 11:02:33 Failed to launch the control URL: [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

ERROR 2023/06/20 11:02:33 Failed to ge the browser: Failed to launch the control URL: [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

INFO 2023/06/20 11:02:33 Initializing the browser for 'XXXX'...
INFO 2023/06/20 11:02:33 Initializing the middleman proxy for 'XXXX'...
ERROR 2023/06/20 11:02:33 [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

ERROR 2023/06/20 11:02:33 Failed to launch the control URL: [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

ERROR 2023/06/20 11:02:33 Failed to ge the browser: Failed to launch the control URL: [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

INFO 2023/06/20 11:02:33 Initializing the browser for 'XXXX'...
INFO 2023/06/20 11:02:33 Initializing the middleman proxy for 'XXXX'...
ERROR 2023/06/20 11:02:33 [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

ERROR 2023/06/20 11:02:33 Failed to launch the control URL: [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

ERROR 2023/06/20 11:02:33 Failed to ge the browser: Failed to launch the control URL: [launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

I've tried to disable-gpu or run with --v=1 as some people suggested for chrome and brave but none works.

NuLL3rr0r commented 1 year ago

Also, I tried to use your suggestion with the latest version from rod_test. Here is the most simple version to reproduce it using browser pools. If I avoid using launcher.NewUserMode to launch Brave and use the default browser it's fine. Otherwise, I get the same error and tries to open all pages in the current browser as tabs:

package rod_test

import (
    "fmt"
    "math/rand"
    "sync"
    "testing"
    "time"

    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
)

// This is the template to demonstrate how to test Rod.
func TestRod(t *testing.T) {
    g := setup(t)
    g.cancelTimeout() // Cancel timeout protection

    pool := rod.NewBrowserPool(3)

    create := func() *rod.Browser {
        s1 := rand.NewSource(time.Now().UnixNano())
        r1 := rand.New(s1)
        userDataDir := fmt.Sprintf("/tmp/bot-%d", r1.Intn(100))

        launcher := launcher.NewUserMode().
            Bin("/usr/bin/brave-bin").
            Leakless(true).
            UserDataDir(userDataDir).
            WorkingDir("").
            Devtools(false).
            Headless(false).
            Set("disable-default-apps").
            Set("no-first-run")

        controlURL, err := launcher.Launch()
        if err != nil {
            fmt.Println(err)
        }

        browser := rod.New().
            ControlURL(controlURL).
            NoDefaultDevice()
        err = browser.Connect()
        if err != nil {
            fmt.Println(err)
        }

        return browser
    }

    yourJob := func(instance int) {
        browser := pool.Get(create)
        defer pool.Put(browser)

        page := browser.MustPage(g.blank())

        page.MustNavigate(g.html(doc)).MustWaitLoad()
        fmt.Println(page.MustInfo().Title)
    }

    wg := sync.WaitGroup{}
    for i := 1; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            yourJob(i)
        }()
    }
    wg.Wait()

    pool.Cleanup(func(browser *rod.Browser) {
        browser.MustClose()
    })

    //g.Eq(browser.MustVersion().ProtocolVersion, "1.3")
    //g.Has(page.MustElement("body").MustText(), "ok")
}

const doc = `
<html>
  <body>ok</body>
</html>
`
NuLL3rr0r commented 1 year ago

Output, then I hit Ctrl+C:

parallel test 16
127.0.0.1:35183
127.0.0.1:40243
127.0.0.1:37293
127.0.0.1:34667
127.0.0.1:36805
127.0.0.1:41213
127.0.0.1:37025
[launcher] Failed to get the debug url: Opening in existing browser session.
libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)

127.0.0.1:41233
^Csignal: interrupt
FAIL    github.com/go-rod/rod   17.916s
NuLL3rr0r commented 1 year ago

Also, I didn't manage to pass the instance parameter to:

create := func() *rod.Browser {

As it results in build errors. So, that's why I use random numbers for each profile dir.

ysmood commented 1 year ago

NewUserMode is not well tuned for multiple-browser use case, why do you have to use it? It's designed to run the current user's default browser.

NuLL3rr0r commented 1 year ago

Thank you! Why I didn't think the issue could be NewUserMode I don't know. But, indeed avoiding that resolves the whole issue.

The only reason I needed it was this issue with the stealth plugin.

ysmood commented 1 year ago

Then you might need to adjust it by yourself, it's out of the scope of what rod is design to solve.