Open ma26yank opened 2 years ago
You can subclass your own RequestsWebClient, and in particular in the get method use requests.get( ... , verify=False)
Then do something like sitemap_tree_for_homepage('https://www.crummy.com', web_client=MyClient())
I was testing this package for a web crawler I was building. But at times it gives below error. Is there any argument I have to pass or is this a bug?
_IndexWebsiteSitemap(url=https://www.crummy.com/, sub_sitemaps=[InvalidSitemap(url=https://www.crummy.com/robots.txt, reason=Unable to fetch sitemap from https://www.crummy.com/robots.txt: HTTPSConnectionPool(host='www.crummy.com', port=443): Max retries exceeded with url: /robots.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (ssl.c:1131)'))))])
what I am trying is: