apify / got-scraping

HTTP client made for scraping based on got.
422 stars 32 forks source link

Some proxies returns 502 after moving to node 20 #93

Closed 3ldar closed 10 months ago

3ldar commented 1 year ago

We use different kinds of proxies and we wanted to upgrade the node version to the latest (Move from 18). And we observed that some certain proxy providers fail. (502 status code, ERR_GOT_REQUEST_ERROR) Here is the stack trace

RequestError: Proxy responded with 502 Bad Gateway: 141 bytes
    at Request._beforeError (C:\Users\ibrahim.koymen\basic projects\got20\node_modules\got-cjs\dist\source\core\index.js:324:21)
    at Request.flush (C:\Users\ibrahim.koymen\basic projects\got20\node_modules\got-cjs\dist\source\core\index.js:313:18)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at ClientRequest.<anonymous> (C:\Users\ibrahim.koymen\basic projects\got20\node_modules\got-scraping\dist\resolve-protocol.js:37:28)
    at Object.onceWrapper (node:events:626:26)
    at ClientRequest.emit (node:events:511:28)
    at Socket.socketOnData (node:_http_client:575:11)
    at Socket.emit (node:events:511:28)
    at addChunk (node:internal/streams/readable:332:12)
    at readableAddChunk (node:internal/streams/readable:305:9)
    at Readable.push (node:internal/streams/readable:242:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)

Here is the code that reproduces the issue :

const {gotScraping} = require("got-scraping");

(async () => {
    try {
        const response = await gotScraping({
            url: 'https://api.apify.com/v2/browser-info',
            proxyUrl: 'http://108.59.14.200:13402'
        });
        console.log(response.body);
    } catch (e) {
        console.log(e);
    }
})();

The proxy URL 'http://108.59.14.200:13402' is provided by storm proxies. And while this fails the same request succeeds with using got and hpagent like the below:

import {HttpsProxyAgent} from 'hpagent';
import {got} from "got";

const response = await got('https://api.apify.com/v2/browser-info', {
    agent: {
        https: new HttpsProxyAgent({
            keepAlive: true,
            keepAliveMsecs: 1000,
            maxSockets: 256,
            maxFreeSockets: 256,
            scheduling: 'lifo',
            proxy: 'http://108.59.14.200:13402'
        })
    }
});

console.log(response.body);

As I said if I downgrade the node version to 18 the same code above just works. And I have some other proxy providers that just work in this code too. What might be the reason?

B4nan commented 11 months ago

And while this fails the same request succeeds with using got and hpagent like the below:

What version of got are you using in that test? got-scraping is still CJS project, so it is on an older got version, could be something that is fixed in the latest major. We will update to that soon'ish, once we move to ESM too.

3ldar commented 11 months ago

And while this fails the same request succeeds with using got and hpagent like the below:

What version of got are you using in that test? got-scraping is still CJS project, so it is on an older got version, could be something that is fixed in the latest major. We will update to that soon'ish, once we move to ESM too.

@B4nan Well, I had two tries one with the actual got-scraping (3.2.13) and got (13.0.0) the other one I manually added the got-scraping into my project and forced it to use got 13.0.0 ESM one. (Made some modifications but they were just like little ones import extensions and type fixings.)

nikosson commented 10 months ago

Have the same situation, but with NodeJS version 18.17.1. With 20.2.0, for example, it works well.

foxt451 commented 10 months ago

@3ldar I've tried it with your proxy and it seems to have stopped working. Is it still operational? And @nikosson , can you tell which one you were using?

3ldar commented 10 months ago

@3ldar I've tried it with your proxy and it seems to have stopped working. Is it still operational? And @nikosson , can you tell which one you were using?

@foxt451 Thanks for the reply, If you can provide me an IP address I can grant you access, sadly the proxy provider only allows IP address authorization.

foxt451 commented 10 months ago

@3ldar sure, can you give me your discord? (mine is helzi#7291); or anything else

3ldar commented 10 months ago

helzi#7291

I sent you a friend request. mine is 3ldar.

foxt451 commented 10 months ago

@3ldar I've tested this proxy on node v18.16.0, 18.17.1, 20.2.0 and it received the response in all the cases. Could you verify this issue persists with the latest version of got-scraping? And if so, specify the exact node version you are using?

image image image

I used your code:

import {gotScraping} from "got-scraping";

console.log(process.version);
try {
    const response = await gotScraping({
        url: 'https://api.apify.com/v2/browser-info',
        proxyUrl: 'http://108.59.14.200:13402'
    });
    console.log(response.body);
} catch (e: any) {
    console.log(e);
}

And got-scraping v ^3.2.15

3ldar commented 10 months ago

@foxt451 Yes, I have also tested it with got-scraping v3.2.15 and node v20.5.0 seems it's working fine. Thanks for the effort.