aivarsk / scrapy-proxies

Random proxy middleware for Scrapy
MIT License
1.65k stars 409 forks source link

How to check that a proxy is really being used? #48

Open ravillarreal opened 6 years ago

ravillarreal commented 6 years ago

In the process_request function the proxy is passed to the request only if has an proxy_user_pass, otherwise only print that the proxy is beign used and which are left. That means that a proxy like https://176.37.14.252:8080 does not work?

This is the function:

 def process_request(self, request, spider):
        # Don't overwrite with a random one (server-side state for IP)
        if 'proxy' in request.meta:
            if request.meta["exception"] is False:
                return
        request.meta["exception"] = False
        if len(self.proxies) == 0:
            raise ValueError('All proxies are unusable, cannot proceed')

        if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS:
            proxy_address = random.choice(list(self.proxies.keys()))
        else:
            proxy_address = self.chosen_proxy

        proxy_user_pass = self.proxies[proxy_address]

        if proxy_user_pass:
            request.meta['proxy'] = proxy_address
            basic_auth = 'Basic ' + base64.b64encode(proxy_user_pass.encode()).decode()
            request.headers['Proxy-Authorization'] = basic_auth
        else:
            log.debug('Proxy user pass not found')
        log.debug('Using proxy <%s>, %d proxies left' % (
                proxy_address, len(self.proxies)))
schiz0phr3ne commented 6 years ago

I made a test with this middleware : without proxy_user_pass (I don't have one to test with), proxy is not used :

import scrapy

class MyipSpider(scrapy.Spider):
    name = 'myip'
    start_urls = ['http://www.mon-ip.com]

    def parse(self, response):
        for in in response.xpath('//*[@id="PageG"]'):
            yield {
                'ip': ip.xpath('p[3]/span[2]//text()').extract_first(),
            }

gives : 2018-08-28 15:17:10 [scrapy.proxies] DEBUG : Using proxy <https://pro.xy.add.ress:port>, x proxies left [...] 2018-08-28 15:17:10 [scrapy.core.scraper] DEBUG : Scraped from <200 http://www.mon-ip.com> {'ip': 'my.ip.add.ress'}

schiz0phr3ne commented 6 years ago

This change works : https://github.com/aivarsk/scrapy-proxies/pull/43/files

BriungRi commented 4 years ago

bump on schizophrene's PR. I was able to use that change and verify that my requests were indeed using a proxy's IP and not my own local IP.