apify / proxy-chain

Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
https://www.npmjs.com/package/proxy-chain
Apache License 2.0
804 stars 138 forks source link

TimeoutError: Navigation timeout of 30000 ms exceeded #527

Open sebastiansieber opened 6 months ago

sebastiansieber commented 6 months ago

Hello all,

The below code works perfectly without using proxy-chain, but as soon as I pass the proxy-server argument to puppeteer it will run into a timeout: TimeoutError: Navigation timeout of 30000 ms exceeded

In local development I use any of those for Chromium: PPTR_EXECUTABLE_PATH=/Applications/Google Chrome.app/Contents/MacOS/Google Chrome PPTR_EXECUTABLE_PATH=/Applications/Chromium.app/Contents/MacOS/Chromium

Anyone an idea why? Thanks

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
require('dotenv').config();
puppeteer.use(StealthPlugin());

const proxyChain = require('proxy-chain');
const proxy_host = process.env.PROXY_HOST;
const proxy_username = process.env.PROXY_USERNAME;
const proxy_password = process.env.PROXY_PASSWORD;

exports.getIP = async (req, res, next) => {
    let browser;
    let newProxyUrl;

    try {
        const oldProxyUrl = `http://${proxy_username}:${proxy_password}@${proxy_host}`;
        newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

        browser = await puppeteer.launch({
            headless: "false",
            ignoreHTTPSErrors: true,
            executablePath: process.env.PPTR_EXECUTABLE_PATH || '/usr/bin/chromium',
            args: [
                `--proxy-server=${newProxyUrl}`,
                `--no-sandbox`,
                `--disable-gpu`,
            ]
        });

        const page = await browser.newPage();
        await page.goto('https://nordvpn.com/what-is-my-ip/', { waitUntil: 'networkidle0' });

        const ip = await page.$eval('.Title.h3.mb-6.js-ipdata-ip-address', el => el.textContent);
        const location = await page.$eval('.js-ipdata-location', el => el.textContent);

        res.status(200).json({ success: true, ip: ip, location: location });
    } catch (error) {
        next(error);
    } finally {
        if (browser) {
            await browser.close();
            await proxyChain.closeAnonymizedProxy(newProxyUrl, true);
        }
    }
}
engineeringstuff commented 5 months ago

@sebastiansieber did you figure this one out - I'm having the same problem

sebastiansieber commented 5 months ago

@engineeringstuff unfortunately the only way i found to solve this is to set the defaultNavigationTimeout to zero (infinite) by page.setDefaultNavigationTimeout(0);

that though leads to new problems if the page indeed doesn't load properly because it will never timeout as it eventually should if it doesn't load.

barjin commented 5 months ago

Hello and thank you for submitting this issue! (and sorry for the wait).

I'm assuming you are using proxy-chain to "glue" your external proxy URL with the credentials (as Puppeteer / Chromium don't work that well with proxy authentication)?

If this is the case, can you try authenticating your Chromium instance with your proxy via the page.authenticate() call (tutorial here)? My hypothesis here is that your script is not timing out because of the proxy-chain package, but simply because of slow response time from your proxy server.

In such case, you can limit the amount of transferred data with something like

await page.setRequestInterception(true);

page.on('request', (req) => {
    if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
        req.abort();
    }
    else {
        req.continue();
    }
});

This way, your Chromium instance won't load stylesheets, fonts, and images over the proxy connection - so loading the page should take less time.

sebastiansieber commented 5 months ago

@barjin thanks for your reply. Back then I did try the approach you suggested:

const context = await browser.createIncognitoBrowserContext({ proxy: proxy_host });
const page = await context.newPage();
await page.authenticate({ username: proxy_username, password: proxy_password });

This worked without issue, which triggered me to post here and report the issue, because it really is just limited to proxy-chain.

sebastiansieber commented 5 months ago

@engineeringstuff did you find a solution?

engineeringstuff commented 5 months ago

Sort of, but no.

I thought that this proxy was more feature-rich and that I could use a different proxy per request using the prepareRequestFunction method. But this library sets the proxy globally each time.

My use-case has a number of proxies and a fallback solution (with retry mechanisms) if one proxy fails.

In the end I used a proxy provider that can provide that service - e.g. I proxy through one address and each time a socket is established a new proxy is used.

I think my problem had something to do with switching the proxy URI so frequently

jirimoravcik commented 2 months ago

Hey @engineeringstuff, You can easily change the proxy you use per request, let me give you a minimal code example:

const ProxyChain = require('proxy-chain');

const server = new ProxyChain.Server({
    port: 8000,
    verbose: true,
    prepareRequestFunction: ({ request, username, password, hostname, port, isHttp, connectionId }) => {
        let upstreamProxyUrl;
        if (username === 'something') {
          upstreamProxyUrl = 'http://username:password@proxy.example.com:3128';
        } else {
          upstreamProxyUrl = 'http://user:pass@proxy.different.com:1234';
        }
        return {
            upstreamProxyUrl,
        };
    },
});

server.listen(() => {
  console.log(`Proxy server is listening on port ${server.port}`);
});

// Emitted when HTTP connection is closed
server.on('connectionClosed', ({ connectionId, stats }) => {
  console.log(`Connection ${connectionId} closed`);
  console.dir(stats);
});

// Emitted when HTTP request fails
server.on('requestFailed', ({ request, error }) => {
  console.log(`Request ${request.url} failed`);
  console.error(error);
});

You can see that I'm setting a different proxy based on username. This can be easily used to implement proxy fallback.

engineeringstuff commented 2 months ago

Hi @jirimoravcik - thanks, this is very similar to the solution I used but the pitfall is that upstreamProxyUrl is global and handled asynchronously within the proxy-chain codebase.

e.g. prepareRequestFunction is handled like an event, and the result is the alteration of a single global value (upstreamProxyUrl)

When a new connection is made with a new proxy address in the username parameter then there's no guarantee that it will be used for that connection (this is in a system with a lot of proxied requests happenning across a wide-range of proxies).

So in my setup I was hoping to have a single proxy-chain instance, but that's not feasible in a high-throughput system with frequent proxy rotation

jirimoravcik commented 2 months ago

Hey @engineeringstuff, that's simply not the case. You can easily supply a different upstreamProxyUrl for each request that comes to the proxy chain server and everything will work correctly (even with a single instance). If this doesn't work for you, can you provide minimal reproduction sample? Thank you!