apify / got-scraping

HTTP client made for scraping based on got.
422 stars 32 forks source link

Got-scraping is broken with NodeJS 18.17 #97

Closed Cooya closed 11 months ago

Cooya commented 11 months ago

When using a proxy, got-scraping is not working with the latest version of NodeJS LTS. To reproduce the problem, install NodeJS 18.17 and run the following snippet :

const { gotScraping } = require('got-scraping');

(async () => {
    try {
        const res = await gotScraping.get({
            url: 'https://ipinfo.io/',
            proxyUrl: 'YOUR_PROXY_URL'
        });
        console.log(res.body);
    } catch(e) {
        console.error(e.message);
    }
})();

You should get the error "Proxy responded with 503 Service Unavailable: 3702 bytes". The error does not occur with NodeJS 18.16.

barjin commented 11 months ago

Hi @Cooya and thank you for submitting this issue!

The problem is most likely Ada - the new URL parser in Node v18.17. It processes URLs in a slightly different way, which causes got-scraping to send malformed requests. I'll try to figure out the best place to fix this and make a PR.

Again, thanks for letting us know (and sorry for the inconvenience)!

JaredBett commented 11 months ago

@barjin I'm still having a problem using got-scraping with a proxy in Node.js 18.17 but only when using an HTTPS proxy and only with HTTP 1.1 servers (using HTTP/2 servers or HTTP proxy seems to work fine).

I'm using a smokescreen proxy with TLS enabled (the --tls-server-bundle-file option). This used to work fine on Node.js 18.16 but now is broken on 18.17.

const { gotScraping } = require('got-scraping');

(async () => {
        try {
                const res = await gotScraping.get({
                        url: 'https://ipinfo.io/',
                        proxyUrl: 'https://localhost:4750'
                }, {
                        http2: false
                });
                console.log(res.body);
        } catch(e) {
                console.error(e.message);
        }
})();

This results in the error: The proxy responded with 407 Request rejected by proxy

barjin commented 11 months ago

Hi @JaredBett - unfortunately I wasn't able to reproduce this. I have set up a local Smokescreen server with a self-signed certificate (which I have installed in my system) and used your example code - and I just got the response body with no errors at all (with Node 16, 18.16, and 18.17). Did you do anything different in your setup?

JaredBett commented 11 months ago

thanks for looking at this @barjin! Since you couldn't reproduce it, I was thinking maybe its something with my smokescreen build. So I just tried with the latest version of smokescreen and the problem went away. There must have been some kind of bug fix that affects node 18.17 in smokescreen or one if its dependencies. thanks again!