alan-turing-institute / misinformation-crawler

Web crawler to collect snapshots of articles to web archive
MIT License
5 stars 2 forks source link

only update cookies when present #355

Closed edwardchalstrey1 closed 5 years ago

edwardchalstrey1 commented 5 years ago

Closes #346

Frontpagemag.com can now be crawled

Tested with huffingtonpost.com which saves cookies, crawler works as expected

edwardchalstrey1 commented 5 years ago

You can simplify this even further with:

try:
    cookies = self.driver.get_cookies()
    if cookies:
        spider.update_cookies(cookies)
except WebDriverException:
    pass

or, equivalently

from contextlib import suppress

with suppress(WebDriverException):
    cookies = self.driver.get_cookies()
    if cookies:
        spider.update_cookies(cookies)

or even (if we're really sure that self.driver.get_cookies() will return an empty list/generator if there are no cookies)

from contextlib import suppress

with suppress(WebDriverException):
    spider.update_cookies(self.driver.get_cookies())

I have gone with option 1 👍