clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium
Do What The F*ck You Want To Public License
919 stars 348 forks source link

response.meta and respose.request.meta['driver'] have different content. #103

Open fabiobatalha opened 2 years ago

fabiobatalha commented 2 years ago

Not sure what is going on but response.request.meta['driver'].get_screenshot_as_png() has a different result comparing with response.meta['screenshot'].

I noticed while handling multiple URLs that the response.request.meta['driver'] does not have the expected webpage content. It has an unpredictable behavior having a mismatching data structure while comparing with the content of the response.url.

In the script bellow, both images of the same uuid4 code saved in the file system were supposed to have the same content, but what we have is an unpredictable behavior where the response.meta['screenshot'] works as expected, but the screenshot produced by response.request.meta['driver'].get_screenshot_as_png() can be anything else.

    def parse(self, response):
        driver = response.request.meta['driver']
        code = uuid.uuid4()
        with open(f'image_driver-{code}.png', 'wb') as image_file:
            image_file.write(driver.get_screenshot_as_png())
        with open(f'image_response-{code}.png', 'wb') as image_file:
            image_file.write(response.meta['screenshot'])