Patrick-Hogan / wandering_inn

Download and convert The Wandering Inn to epub and mobi (kindle) format
27 stars 16 forks source link

SSL Verification Issue #26

Open KwantumFizzix opened 9 months ago

KwantumFizzix commented 9 months ago

Running this command: C:\...\wandering_inn>python wanderinginn2epub.py --volume 9 --output-by-volume

I get the following output:

Traceback (most recent call last):
  File "C:\Users\steph\wandering_inn\wanderinginn2epub.py", line 382, in <module>
    main()
  File "C:\Users\steph\wandering_inn\wanderinginn2epub.py", line 316, in main
    full_index = get_index()
  File "C:\Users\steph\wandering_inn\wanderinginn2epub.py", line 216, in get_index
    page = urlopen(toc_url)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
    response = self._open(req, data)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1345, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1122)>Traceback (most recent call last):
  File "C:\Users\steph\wandering_inn\wanderinginn2epub.py", line 382, in <module>
    main()
  File "C:\Users\steph\wandering_inn\wanderinginn2epub.py", line 316, in main
    full_index = get_index()
  File "C:\Users\steph\wandering_inn\wanderinginn2epub.py", line 216, in get_index
    page = urlopen(toc_url)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
    response = self._open(req, data)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "C:\Users\steph\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1345, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1122)>

I'm pretty new to using github and have been using ChatGPT to get me started so I could definitely (most likely) be missing something obvious.

Patrick-Hogan commented 9 months ago

I'm not able to reproduce, but I don't run on windows. Best guess is that the root cert signing wanderinginn.com is not included in the windows cert store (or urllib.request.urlopen isn't loading the windows cert store).

You can download it (using openssl or simply by going to the website and clicking the padlock icon next to the url, then viewing the cert information and saving the PEM or chain to a local folder) and manually specify the cert path by adding cafile=<full-path-to-cert-file> to each urlopen call. E.g., if you save the cert as "C:\Users\steph\wanderinginn.cert", you would use:

urlopen(..., cafile=r'"C:\Users\steph\wanderinginn.cert")

Keeping all of the other arguments as-is. I can add this as a pass-through option sometime so you can specify it when calling the script rather than needing to modify the code, but it may be a little while before I get to it.

Patrick-Hogan commented 9 months ago

Hm. or I suppose it may be related to this issue. Going to try to get caught up on open PRs and test w/ the redone table of contents page; you may just want to re-test when that's finished.

edit: scratch that--that ssl issue was only related to images, AFAIK. This is probably just how urllib and windws cert store interact. l went ahead and created a branch that exposes the cafile option to urllib as a new command line flag ('--cafile'), so you can check out that branch and try it if you want. I'm unable to verify the changes at the moment, though, since I'm again banned by the wordpress firewall on wanderinginn website.

StoneLabs commented 9 months ago

I'm not able to reproduce, but I don't run on windows. Best guess is that the root cert signing wanderinginn.com is not included in the windows cert store (or urllib.request.urlopen isn't loading the windows cert store).

You can download it (using openssl or simply by going to the website and clicking the padlock icon next to the url, then viewing the cert information and saving the PEM or chain to a local folder) and manually specify the cert path by adding cafile=<full-path-to-cert-file> to each urlopen call. E.g., if you save the cert as "C:\Users\steph\wanderinginn.cert", you would use:

urlopen(..., cafile=r'"C:\Users\steph\wanderinginn.cert")

Keeping all of the other arguments as-is. I can add this as a pass-through option sometime so you can specify it when calling the script rather than needing to modify the code, but it may be a little while before I get to it.

can confirm this, had the same issue and fixed it with certifi

import certifi

[...]
-     page = urlopen(toc_url)
+     page = urlopen(toc_url, cafile=certifi.where())