GateNLP / ultimate-sitemap-parser

Ultimate Website Sitemap Parser
https://mediacloud.org/
Other
181 stars 64 forks source link

add optional argument to requests web client, to ignore SSL checking #37

Open japherwocky opened 1 year ago

japherwocky commented 1 year ago

Hello -

This adds a very simple verify argument to the requests based web client. You can use it pretty easily for issues like #33 by doing something like:

        from usp.tree import sitemap_tree_for_homepage
        from usp.web_client.requests_client import RequestsWebClient

        client = RequestsWebClient(verify=False)  # this PR adds the verify kwarg, which defaults to True
        client.__USER_AGENT = 'Friendly Spider /{}'

        tree = sitemap_tree_for_homepage('https://nytimes.com', client)