clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium
Do What The F*ck You Want To Public License
913 stars 345 forks source link

Does the `wait_time` argument need a `wait_until` to work correctly? #109

Open tungland opened 2 years ago

tungland commented 2 years ago

In the following parser I want the spider to SeleniumRequest all links on a page according to the rules I have specified in the Srapy LinkExtractor 'le'. It seems to me that no matter what wait_time I pass it does that same thing. Am I doing something wrong? Does the wait_time argument need a wait_until to work correctly? Or something else?

def parse(self, response):      
        for link in self.le.extract_links(response):                                   
            yield SeleniumRequest(url=link.url, 
                                  callback=self.parse,
                                  wait_time=40
                                    ) 
MarcoCaglia commented 2 years ago

Not sure if this is still relevant, but I encountered the same issue.

From looking through the code, it looks like wait_time defined a timeout for the wait_until event.

Based on the code below:

    def process_request(self, request, spider):
        """Process a request using the selenium driver if applicable"""
        [...]

        if request.wait_until:
            WebDriverWait(self.driver, request.wait_time).until(request.wait_until) 

I don't see another reference to wait_time in the code.

It looks like that wait_time is indeed ignored when no wait_until is specified. Otherwise the WebDriver will wait for the specified event, but only for a maximum of wait_time seconds.