I think there is a problem with this code in full_domain_spider:
for link in LxmlLinkExtractor(unique=True).extract_links(response):
if not response.url in self.already_crawled:
self.already_crawled.add(link.url)
yield WebdriverRequest(link.url, callback=self.parse_item)
else:
print "avoiding request for: ", response.url
When yielding requests for scrapy, the spider tests response.url if it is already crawled instead of link.url.
I think the code should be:
for link in LxmlLinkExtractor(unique=True).extract_links(response):
if not link.url in self.already_crawled:
self.already_crawled.add(link.url)
yield WebdriverRequest(link.url, callback=self.parse_item)
else:
print "avoiding request for: ", link.url
I think there is a problem with this code in full_domain_spider:
When yielding requests for scrapy, the spider tests
response.url
if it is already crawled instead oflink.url
.I think the code should be: