gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.39k stars 1.77k forks source link

Cannot send request with no Accept header #783

Closed k4lizen closed 1 year ago

k4lizen commented 1 year ago

From colly.go: image

I need to send a request with no headers, but colly automatically inserts "Accept: */*" AFTER the OnRequest call. So I don't know how to affect it.

Example why I need to do this:

func thisworks() {
    client := &http.Client{}
    req, _ := http.NewRequest("GET", "https://api.yep.com/fs/2/search?client=web&gl=US&no_correct=false&q=youtube&safeSearch=off&type=web", nil)
    //req.Header.Set("Accept", "*/*")
    resp, _ := client.Do(req)

    log.Debug().Msgf("WRequest Header: %v", resp.Request.Header)
    log.Debug().Msgf("WResponse Header: %v", resp.Header)
    body, _ := ioutil.ReadAll(resp.Body)
    log.Debug().Msgf("WResponse Body: %v", string(body))
}

func thisdoesnt() {
    client := &http.Client{}
    req, _ := http.NewRequest("GET", "https://api.yep.com/fs/2/search?client=web&gl=US&no_correct=false&q=youtube&safeSearch=off&type=web", nil)
    req.Header.Set("Accept", "*/*")
    resp, _ := client.Do(req)

    log.Debug().Msgf("Request Header: %v", resp.Request.Header)
    log.Debug().Msgf("Response Header: %v", resp.Header)
    body, _ := ioutil.ReadAll(resp.Body)
    log.Debug().Msgf("Response Body: %v", string(body))
}

The two functions give different responses, the only difference being the req.Header.Set("Accept", "*/*") line. I need the response that I get when the Accept isn't passed.

The good response: ["Ok",{"results":[{"url":"https://apps.apple.com/us/app/youtube-watch-listen-stream/id544007664","title":"YouTube: Watch, Liste.... The bad response: ["Ok",{"results":[{"url":"https://www.commonsensemedia.org/website-reviews/youtube","title":"YouTube Website Review | Co...

k4lizen commented 1 year ago

I am currently using a fork (from the PR above^) to circumvent this issue