apify / got-scraping

HTTP client made for scraping based on got.
423 stars 32 forks source link

What's up with _unixOptions? #75

Closed pimterry closed 1 year ago

pimterry commented 1 year ago

I'm looking at ways to improve my request fingerprints to avoid blocks, and I ran into this project. Lots of interesting tricks here! Excellent work.

One that I found interesting was setting _unixOptions with additional TLS settings: https://github.com/apify/got-scraping/blob/8d9d4f1e6f0144cfd7993c958fcc71e6c61c00ff/src/hooks/tls.ts#L219-L224. I can't find any info about this anywhere, even in Node's source, and it's not mentioned in your docs. Can you explain at all what this does?

szmarczak commented 1 year ago

It's a Got internal option. Got parses UNIX socket path into options._unixOptions which is just a plain object that is later merged with other options. It's used to enable insecure HTTP parser. For example Insomnia was sending broken (non-spec) headers to check whether the parser is lenient or not. Node.js' by default is not, but browser allow broken headers. I don't think they do that anymore.

https://github.com/sindresorhus/got/blob/e032b60ff3285403b8f9627dcdd32aae39867d7a/source/core/options.ts#L1442 https://github.com/sindresorhus/got/blob/e032b60ff3285403b8f9627dcdd32aae39867d7a/source/core/options.ts#L2428

pimterry commented 1 year ago

Super helpful, thanks! :+1: