apify / got-scraping

HTTP client made for scraping based on got.
422 stars 32 forks source link

Possible to support alternative agents? #112

Open tonybruess opened 8 months ago

tonybruess commented 8 months ago

I'd like to use hpagent because it supports reusable sockets.

I tried passing in agent: { https: httpsAgent } but it was overridden / didn't work.

I'm wondering if this is easily feasible or if there's some limitation I'm overlooking. If there's no gotchas I'm happy to make a PR.

B4nan commented 8 months ago

Feel free to PR this, I guess we just didn't think about such use case.

vladfrangu commented 8 months ago

Hey, I'm looking into this issue right now, but can't seem to figure out how to test this... it also shouldn't be overriding your agent unless you also provided a proxyUrl in your got options.

Do you have a reproduction sample we could use? 🙏

Edoardopacino commented 4 months ago

I also have encountered this problem, if I send multiple requests to the same host, it should use the same socket instead of establishing connection to the proxy server every time.(hpagent doesn't seem to support http2). (update: http2-wrapper's agent supports reusable sockets)

tonybruess commented 4 months ago

I took a look at this again and I think there are two separate issues that I didn't articulate well in my original report.

  1. agent is overridden if proxyUrl is specified. I understand now this is expected behavior, but I didn't understand this originally. Should we raise an exception or print a warning if both are specified?
  2. There is no built-in support for reusable sockets. Is this a feature the team would consider supporting? In the code I see the following comment: Sockets must not be reused, the proxy server may rotate upstream proxies as well. As a workaround I'm just manually specifying agent and not using proxyUrl.

Probably worth splitting this single issue into two separate issues depending on how you'd like to proceed.

fduman commented 2 months ago

@tonybruess I set the "http2: false" then I could use the hpagent.