Open andystroz opened 5 years ago
Just saw this issue and I think its related because http://httpbin.org/ip
works but https://httpbin.org/ip
does not. The website I am looking to scrape uses https
Hi Andrew I met same error. Did you figure it out?
Yea! I did end up getting it working, hopefully this still helps.
c := colly.NewCollector()
c.WithTransport(s.HTTPTransport)
Where s.HTTPTransport
is:
&http.Transport{
Proxy: rp,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: true,
}).DialContext,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
TLSNextProto: nil,
}
Where rp
is a function that specifies a proxy to use.
// AuthenticatedRoundRobinProxyHTTP returns the proxy fuction used by http for an authenticated round robin proxy
func AuthenticatedRoundRobinProxyHTTP(proxyURLs []string, username string, password string) (func(*http.Request) (*url.URL, error), error) {
roundRobinSwitcher, err := collyProxy.RoundRobinProxySwitcher(proxyURLs...)
if err != nil {
return nil, err
}
return (&authenticatedRoundRobinSwitcher{roundRobinSwitcher, username, password}).GetAuthenticatedProxy, nil
}
Where proxyURLs []string
is a list of socks5 proxy URLs ex. socks5://127.0.0.1:1337,socks5://127.0.0.1:1338
and &authenticatedRoundRobinSwitcher{roundRobinSwitcher, username, password}).GetAuthenticatedProxy
returns (request *http.Request)
.This request has correct proxy headers and the URL returned from the colly round robin switcher.
type authenticatedRoundRobinSwitcher struct {
roundRobinSwitcher colly.ProxyFunc
username string
password string
}
func (r *authenticatedRoundRobinSwitcher) GetAuthenticatedProxy(request *http.Request) (*url.URL, error) {
//Adding proxy authentication
auth := r.username + ":" + r.password
basicAuth := "Basic " + base64.StdEncoding.EncodeToString([]byte(auth))
request.Header.Add("Proxy-Authorization", basicAuth)
request.Header.Add("Proxy-Connection", "Keep-Alive")
url, err := r.roundRobinSwitcher(request)
if err != nil {
return nil, err
}
return url, nil
}
This was a doozy to figure out so lmk if you got any questions :)
it seems we can specify username/password on the proxy itself.
roundRobinSwitcher, err := collyProxy.RoundRobinProxySwitcher("socks5://username:password@127.0.0.1:1337")
and the http package will add auth header for us, check: https://github.com/golang/go/blob/master/src/net/http/transport.go#L1624
case cm.proxyURL.Scheme == "socks5":
conn := pconn.conn
d := socksNewDialer("tcp", conn.RemoteAddr().String())
if u := cm.proxyURL.User; u != nil {
auth := &socksUsernamePassword{
Username: u.Username(),
}
auth.Password, _ = u.Password()
d.AuthMethods = []socksAuthMethod{
socksAuthMethodNotRequired,
socksAuthMethodUsernamePassword,
}
d.Authenticate = auth.Authenticate
}
if _, err := d.DialWithConn(ctx, conn, "tcp", cm.targetAddr); err != nil {
conn.Close()
return nil, err
}
I was wondering how to use the proxy functionality with Basic Auth? I've modified the callback function in insert the base64 encoded username and password with the
Proxy-Authorization
header. I have also verified the same HTTP request works when usinghttps://httpbin.org/ip
in cURL but Colly returnsError: Proxy Authentication Required
.