cyrus-and / chrome-remote-interface

Chrome Debugging Protocol interface for Node.js
MIT License
4.29k stars 309 forks source link

Unable to connect with proxy server #442

Closed SwitchGM closed 3 years ago

SwitchGM commented 3 years ago
Component Version
Operating system win 10
Node.js 12.16.1
Chrome/Chromium/... Chrome 87 (latest), using Chrome Launcher 0.13.4
chrome-remote-interface 0.28.2

Is Chrome running in a container? YES / NO

Attempting to connect to webpages through a free proxy server, using sites like:

https://www.proxynova.com/proxy-server-list http://www.freeproxylists.net/ https://free-proxy-list.net/

For this I've been using Chrome Launcher to set up the headless chrome instance, and then some really basic chrome remote interface stuff to test that the IP address has changed. I'm checking my IP address of the headless chrome instance using this website https://whatismyipaddress.com/. Following similarly to https://www.youtube.com/watch?v=wAyocwixpFA.

Code that I'm using to connect, use proxy and check IP

const chromeLauncher = require("chrome-launcher");
const CDP = require("chrome-remote-interface");
const proxy = "--proxy-server=PROXY_IP:PROXY_PORT";

(async function() {

    // launch chrome running on debug port 9222 (for chrome-remote-interface), with headless and proxy flags

    let chrome = await chromeLauncher.launch({
        port: 9222, 
        chromeFlags: ["--disable-gpu", "--headless", "--enable-logging", proxy]
    });

    let client = await CDP();

    let Page = client.Page;
    await Page.enable();

    let Runtime = client.Runtime;
    await Runtime.enable();

    // go to website and wait until page has loaded

    await Page.navigate({ url: "https://whatismyipaddress.com/"});
    await Page.loadEventFired();

    // scrape IP address from website

    const { result: { value }} = await Runtime.evaluage({
        expression: "document.querySelectorAll('#ipv4 > a')[0].innerHTML;"
    });

    // return the scraped ip address eg; (255.255.255.255)

    console.log("IP Address:", value);
})();

In this case with and without using the chrome flag --proxy-server=PROXY_IP:PROXY_PORT, I always retrieve my IP address, and never get the address of any of the proxy servers. I've also tried different variations of the --proxy-server option: You can find the three here https://www.chromium.org/developers/design-documents/network-settings#TOC-Command-line-options-for-proxy-settings

--proxy-server=PROXY_IP:PROXY_PORT eg; (--proxy-server=255.255.255.255:8080) --proxy-server=SCHEME=PROXY_IP:PROXY_PORT eg; (--proxy-server=https=255.255.255.255:8080) --proxy-server=LINK_TO_PROXY eg; (https://255.255.255.255:8080)

I've tried all of these, including adding the "double quotes" around the values, and all have given me the same result, I've also tried using a random string (in attempt to get an error) for the proxy ip, and again get the same result. I'm not sure whether this is a chrome-launcher issue or a chrome-remote-interface issue either. In any case, I never recieve an error from chrome-launcher or chrome-remote-interface, and the script just spits out my IP address.

Any help with this would be greatly appreciated, if you need any more information I can provide this asap.

EDIT: For anyone with the same issue, I was able to solve this issue using --proxy-server=PROXY_IP:PROXY_PORT eg; ( --proxy-server=255.255.255.255:8080 ), and the security code provided below by cyrus-and. I would sugest using a free proxy for testing from http://www.freeproxylists.net/, atleast proxies from there worked for me

cyrus-and commented 3 years ago

I think you need to handle certificates errors as shown here or use HTTP.

The following works for me:

const CDP = require('chrome-remote-interface');
const chromeLauncher = require('chrome-launcher');

(async function () {
    const chrome = await chromeLauncher.launch({
        port: 9222,
        chromeFlags: [
            '--headless',
            '--proxy-server=SCHEME://IP:PORT'
        ]
    });

    const client = await CDP();
    const {Page, Runtime, Security} = client;

    Security.certificateError(({eventId}) => {
        Security.handleCertificateError({
            eventId,
            action: 'continue'
        });
    });

    await Security.enable();
    await Security.setOverrideCertificateErrors({override: true});

    await Page.enable();
    await Page.navigate({url: 'https://whatismyip.akamai.com/'});
    await Page.loadEventFired();

    const {result: {value}} = await Runtime.evaluate({
        expression: 'document.body.innerHTML'
    });

    console.log(value);
    await chrome.kill();
})();
SwitchGM commented 3 years ago

I've used the code you shared, and used the first 3 proxy servers I could find from: http://www.freeproxylists.net/, however I am still returned my IP ? Is there something I'm missing here, I'm running this locally on my machine.

To be absoloutly certain, I've used chromeFlags: ['--headless', '--proxy-server=HTTP://13.92.119.142:80'] , from

image

Could I be misinterpreting the --proxy-server values ?

cyrus-and commented 3 years ago

Does it work if you manually start Chrome in that way then navigate to https://whatismyip.akamai.com/? I suspect that Windows overrides the proxy choice.

SwitchGM commented 3 years ago

I had a go at creating a chrome instance through the command line with the --proxy-server=HTTP://13.92.119.142:80 option, which failed to connect.

I then changed to use --proxy-server=13.92.119.142:80 removing the SCHEME. In that later case I was able to connect through the proxy server. I gave it a try with the script you shared which fortunatly worked.

I did have some issues leading to it, such as delays in nagivating to the page which still persist (i assume this is just a normal thing when using proxies).

It does seem that overriding certificate errors is mandatory when doing this, as commenting out the relevant security code didn't seem to let me connect to the proxy.

P.S: for anyone stumbling on this with similar issues, check the code that cyrus-and provided, as well as the list of free proxy servers that I provided for testing with.

cyrus-and commented 3 years ago

I did have some issues leading to it, such as delays in nagivating to the page which still persist (i assume this is just a normal thing when using proxies).

Those free proxies are not reliable at all, what do you really want to achieve, if I may ask? There could be some alternatives.

SwitchGM commented 3 years ago

Those free proxies are not reliable at all, what do you really want to achieve, if I may ask? There could be some alternatives.

I did take a look at some private proxies to use, https://www.webshare.io/private-proxy. Would something like this be more reliable than the free stuff ? Project is just scraping sites

cyrus-and commented 3 years ago

If the goal is to simply hide your IP from the final host you might consider using a VPN (there are some free ones up to certain GB of traffic) or even use TOR. Bear in mind that, especially in the former case (or any other proxy provider, like the one you linked), real privacy cannot be achieved, you simply chose who to trust and cross your fingers. This might or might not be enough for you.

SwitchGM commented 3 years ago

I've decided to give TOR a try, are you aware of any examples / tutorials that I could follow to achieve this still using CRI ?

cyrus-and commented 3 years ago

Just run TOR (a SOCKS5 proxy) then use --proxy-server=socks://localhost:9050.

SwitchGM commented 3 years ago

Before I start using TOR, is there a method of specifying the specific instance of chrome (in this case a chrome browser that uses a specific proxy server) that you want to use for your CRI code.

A similar thing is avaliable with puppeteer in which you can set the options for chrome, and then create a "browser" from those options.

const puppeteer = require('puppeteer');

(async () => {

    const options = {
        headless: true,
        args: [
            '--disable-gpu',
            '--no-sandbox'
        ],
    };

    const browser = await puppeteer.launch(options);
    // do puppeteer stuff
})();

As stated before, I'm using chrome launcher which (upon launching and passing it the relevant chrome options) returns what I assume is an instance of the browser.

let chrome = await chromeLauncher.launch({
    port: 9222,
    chromeFlags: ["--disable-gpu", "--headless", "--enable-logging"]
});

I'm wondering whether there is a similar way of doing this (like puppeteer) before I delve into using TOR ?

cyrus-and commented 3 years ago

Just use a different port (if you need) an use that port with CRI:

const client = await CDP({port: 1234});
SwitchGM commented 3 years ago

This seems to work perfectly, thank you! I'm wondering whether CRI / CDP has a method that allows me to authenticate a proxy username / password in headless mode ? I'm recieving a 407 response (proxy authentication required).

EDIT: So far I have found this https://groups.google.com/a/chromium.org/g/headless-dev/c/KOR84u-FNU0/m/TGc6HVbwBAAJ, for the dev tools protocol, using Network.requestIntercept ?

EDIT 2: https://chromedevtools.github.io/devtools-protocol/tot/Network/#type-AuthChallenge Here's the exact section in the CDP for authenticating proxies, unfortunatly I'm not sure how to write this in code.

EDIT 3: https://github.com/cyrus-and/chrome-remote-interface/blob/master/lib/protocol.json found AuthChallange, and AuthChallangeResponse from some place in the CRI repo, I'm fairly sure that this is possible now, just not sure how to go about doing then when recieving a 401 or 407 error

EDIT 4: Possibly getting closer here, using puppeteer as a reference as it has a #authenticate method for private proxy servers, after a bit of digging (initially scoped around inside of the #authenticate method itself, and wasn't able to make too much sense) I found an auth method that seems to be related to proxies https://github.com/puppeteer/puppeteer/blob/49f25e2412fbe3ac43ebc6913a582718066486cc/utils/testserver/index.js#L188. The function seems to be used later here https://github.com/puppeteer/puppeteer/blob/49f25e2412fbe3ac43ebc6913a582718066486cc/src/common/NetworkManager.ts#L194, unfortunatly I don't understand fully the parameters passed.

cyrus-and commented 3 years ago

Sorry for the late reply, here you go, this should work:

const CDP = require('chrome-remote-interface');

const PROXY_AUTH = {
    username: 'user',
    password: 'password'
};

CDP(async (client) => {
    const {Fetch, Network, Page} = client;

    // provide credentials when needed
    Fetch.authRequired(({requestId}) => {
        Fetch.continueWithAuth({
            requestId,
            authChallengeResponse: {
                response: 'ProvideCredentials',
                ...PROXY_AUTH
            }
        });
    });

    // just continue any other requests
    Fetch.requestPaused(({requestId}) => {
        Fetch.continueRequest({requestId});
    });

    // enable requests interception
    await Fetch.enable({handleAuthRequests: true});

    // usual demo code below...

    Network.requestWillBeSent((params) => {
        console.log(params.request.url);
    });

    try {
        await Network.enable();
        await Page.enable();
        await Page.navigate({url: 'https://github.com'});
        await Page.loadEventFired();
    } catch (err) {
        console.error(err);
    } finally {
        client.close();
    }
}).on('error', (err) => {
    console.error(err);
});
SwitchGM commented 3 years ago

Exactly what I was looking for, thank you again for the assistance