NikolaiT / se-scraper

Javascript scraping module based on puppeteer for many different search engines...
https://scrapeulous.com/
Apache License 2.0
542 stars 123 forks source link

Error: Proxy output ip <proxy-ip> does not match with provided one #51

Open ceylanb opened 5 years ago

ceylanb commented 5 years ago

The error I have encountered is: "Error: Proxy output ip socks5://192.169.156.211:50479 does not match with provided one"

I have been trying to run for_the_lulz.js with a proxy file, but I have failed. Why the error is occurred? What is the reason of it?

Update: proxies feature is not working... When I run the test_proxyflag.js with the same proxy it is working successfully but with se-scrapper not.

gregghawes commented 4 years ago

I'm having the same issue, did you find a solution?

Lusitaniae commented 4 years ago

if log_ip_address is enabled, se-scraper will compare ipinfo details with the provided proxy and fail, at least in my case, using proxy-chain.


        // check that our proxy is working by confirming
        // that ipinfo.io sees the proxy IP address
        if (this.proxy && this.config.log_ip_address === true) {
            debug(`${this.metadata.ipinfo.ip} vs ${this.proxy}`);

            // if the ip returned by ipinfo is not a substring of our proxystring, get the heck outta here
            if (!this.proxy.includes(this.metadata.ipinfo.ip)) {
                throw new Error(`Proxy output ip ${this.proxy} does not match with provided one`);
            } else {
                this.logger.info(`Using valid Proxy: ${this.proxy}`);
            }

        }

Adding log_ip_address: false, bypasses the code branch.

const se_scraper = require('se-scraper');
const proxyChain = require('proxy-chain');
const proxy = 'http://user:pass@provider:11111';

(async () => {
    const newProxyUrl = await proxyChain.anonymizeProxy(proxy);
    let browser_config = {
        debug_level: 2,
        // output_file: '/tmp/se-results',
        log_ip_address: false,
        block_assets: false,
        proxies: [newProxyUrl],
        use_proxies_only: true,
    };

   ...

})();

Guess this is edge case

I think this should be refactored, a "log" flag variable should not be interrupting execution flow.

[i] 14.12.37.46 vs http://127.0.0.1:38020
(node:20395) UnhandledPromiseRejectionWarning: Error: Proxy output ip http://127.0.0.1:38020 does not match with provided one
    at GoogleScraper.load_search_engine (/home/q/node_modules/se-scraper/src/modules/se_scraper.js:132:23)