Open iamumairayub opened 4 years ago
I plus-oned this and then solved it for myself a little later.
For me this is not in the context of testing, so I have no need for contracts (at least as far as understand it).
My solve was the following:
start_requests()
(as you have done)parse_result
. I notice that you use parse_result()
instead of parse()
Once I did this it started working. My solution snippet:
def start_requests(self):
cls = self.__class__
if not self.start_urls and hasattr(self, 'start_url'):
raise AttributeError(
"Crawling could not start: 'start_urls' not found "
"or empty (but found 'start_url' attribute instead, "
"did you miss an 's'?)")
for url in self.start_urls:
yield SeleniumRequest(url=url, dont_filter=True)
def parse(self, response):
le = LinkExtractor()
for link in le.extract_links(response):
yield SeleniumRequest(
url=link.url,
callback=self.parse_result
)
def parse_result(self, response):
page = PageItem()
page['url'] = response.url
yield page
Hey @undernewmanagement
I tried your snippet but the links in LinkExtractor are not processed correctly (response body is not text).
rules = ( Rule(LinkExtractor(restrict_xpaths=(['//*[@id="breadcrumbs"]'])), follow=True),)
def start_requests(self):
for url in self.start_urls:
yield SeleniumRequest(url=url, dont_filter=True,)
def parse_start_url(self, response):
return self.parse_result(response)
def parse(self, response):
le = LinkExtractor()
for link in le.extract_links(response):
yield SeleniumRequest(url=link.url, callback=self.parse_result,)
def parse_result(self, response):
page = PageItem()
page['url'] = response.url
yield page
I had to use parse_start_url to assign the parse_result callback to start urls.
Do you know what the problem could be? I'm new in Scrapy and Python.
Thanks!
Hey @educatron thanks for the question - let's not hijack the thread here. I think you should take that question directly to the scrapy community. https://scrapy.org/community/
Ok. Thanks!
@clemfromspace I just decided to use your package in my Scrapy project but it is just yielding normal scrapy.Requuest instead of SeleniumRequest
I have seen this issue but this is not helpful at all