gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.2k stars 1.76k forks source link

Support for Brotli #751

Open todd-bush opened 1 year ago

todd-bush commented 1 year ago

I've run into a few sites that are compressing using Brotli over gzip.

https://chromestatus.com/feature/5420797577396224

Native support for Brotli would be a nice enhancement.

Also, if there's a workaround to handle Brotli sites, please let me know.

vvo459 commented 8 months ago

For me, this workaround helped:

import "github.com/andybalholm/brotli"

c.OnResponse(func(r *colly.Response) { ....

  if contentEncodingHeader == "br" {
        bodyReader := bytes.NewReader(r.Body)
        brReader := brotli.NewReader(bodyReader)
        decompressed, err := io.ReadAll(brReader)
        if err != nil {
            log.Println("Error during Brotli decompression:", err)
        }
        decompressedBody = string(decompressed)
  } 

}

truthtracer commented 2 months ago

I've add br support on https://github.com/truthtracer/colly, you can try it