Closed mntolia closed 9 months ago
Ok perfect. I will try to reproduce. Thanks
Hello @mntolia ,
I use scrapy-impersonate with Scrapoxy:
Here is the spider:
from typing import Iterable
import scrapy
from scrapy import Request
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["www.ah.nl"]
def start_requests(self) -> Iterable[Request]:
yield Request(
url="https://www.ah.nl/sitemaps/entities/products/detail.xml",
dont_filter=True,
meta={
"impersonate": "chrome110",
"impersonate_args": {
"verify": False,
},
},
callback=self.parse
)
def parse(self, response):
pass
And settings.py:
BOT_NAME = "testscrapy"
SPIDER_MODULES = ["testscrapy.spiders"]
NEWSPIDER_MODULE = "testscrapy.spiders"
ROBOTSTXT_OBEY = False
DOWNLOADER_MIDDLEWARES = {
'scrapoxy.ProxyDownloaderMiddleware': 100,
}
REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
FEED_EXPORT_ENCODING = "utf-8"
SCRAPOXY_MASTER = "http://localhost:8888"
SCRAPOXY_API = "http://localhost:8890/api"
SCRAPOXY_USERNAME = "<USERNAME>"
SCRAPOXY_PASSWORD = "<PASSWORD>"
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
Scrapoxy has 1 droplet on Digital Ocean.
Requests correctly go through Scrapoxy (I've got 403 but it is the antibot).
Did you set "verify": False
on the request?
It worked Thanks @fabienvauchelles
It was indeed the issue with me not setting verify
. I appreciate you taking the time to test!
you're welcome. Thank for using Scrapoxy!
Current Behavior
When I use DO instances with my project I get a 407 error. I do not get the same error with using IPRoyal proxies with scrapoxy.
I use
curl_cffi
to emulate a browser's TLS fingerprint. It works fine with IP royaleExpected Behavior
I should get a 200 response code
Steps to Reproduce
Failure Logs
Scrapoxy Version
latest
Custom Version
Deployment
Operating System
Storage
Additional Information
EDIT: I also tried without curl_cffi library. I still get the same response.