go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.37k stars 353 forks source link

Support connecting to BrightData Scraping Browsers #1092

Open apuigsech opened 3 months ago

apuigsech commented 3 months ago

I'd like to use scalable browser infrastructure services, such as the BrightData Scraping Browser, which integrate well with Puppeteer. But, I have encountered some issues when trying to use these services with go-rod. I would like to request the following enhancements to improve compatibility:

1. WebSocket Authentication:

The connection to these services’ WebSocket requires authentication (e.g., wss://user:pass@host:9222). However, go-rod does not currently send the necessary authentication headers, which I think are not defined on any WebSocket standard.

Through my research, I discovered that authentication is performed using Basic tokens. I have implemented a working solution to inject the Authorization header. However, I am unsure if this is the optimal place to inject it. If this solution aligns with the project's direction, I am willing to submit a PR with my implementation.

2. Less Restrictive WebSocket Response Handling:

The services sometimes send responses that deviate from the expected go-rod Response structure, causing panics due to unmarshalling failures. Specifically, the Error struct expects an integer Code, but some responses include a string (e.g., "navigate_limit").

The Response structure is defined this way:

type Response struct {
  ID     int             `json:"id"`
  Result json.RawMessage `json:"result,omitempty"`
  Error  *Error          `json:"error,omitempty"`
}

type Error struct {
  Code    int    `json:"code"`
  Message string `json:"message"`
  Data    string `json:"data"`
}

And Brightdata is sending to me struct that panic, like this

{
  "id": 27,
  "sessionId": "BRD_461626884EEF95862B6188C2DBB766D1",
  "error": {
    "message": "Page.navigate limit reached",
    "code": "navigate_limit" // This is expected as an Int.
  },
  "duration": 1.2261550000112038
}

To make go-rod more compatible, it may be necessary to relax the strictness of the standard for the Error struct. I would appreciate guidance on the best approach to achieve this flexibility. If you agree with this, I am happy to work on the implementation with a bit of guidance.

github-actions[bot] commented 3 months ago

Please add a valid Rod Version: v0.0.0 to your issue. Current version is v0.116.2

generated by check-issue

apuigsech commented 3 months ago

I am using the las version of go-rod (v0.116.2).

ysmood commented 3 months ago

Have you checked this example file? You can use other websocket lib to do any kind of auth you like:

https://github.com/go-rod/rod/blob/8ffcc0f42d59b43ffd08fd34a2662c7feb5f6272/lib/examples/custom-websocket/main.go#L26-L49

ysmood commented 3 months ago

About the error string, you can also use your customized websocket to convert the error to a number:

// Read ...
func (w *WebSocket) Read() ([]byte, error) {
    b, err := wsutil.ReadServerText(w.conn)
        // parse b, and replace the string to int, then encode it to json bytes
        ...
        return normalized, err
}

I think the error string is a bug of BrightData, we should raise an issue about it. It should follow the cdp protocol definition.