IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

SSL: CERTIFICATE_VERIFY_FAILED on Windows with xa.yimg.com #24

Closed n4mwd closed 4 years ago

n4mwd commented 4 years ago

This is issue is both help for others that haven't gotten this far and an unresolved problem tht I need help with. Here is what I did to get what I have so far. Maybe this will help other people.

First, I downloaded the latest python version 2.7 here: https://www.python.org/ftp/python/2.7.16/python-2.7.16.msi

Once downloaded, I double clicked the file to start the installer. I took all the defaults EXCEPT when it asks what modules to install. All are selected by default except the path variables. I switched those to full on. Then I let the installer do its thing for the remainder.

If you don't include the paths in the install, it wont work right.

Clone the archiver files from github to a zip file. Open the zip file and copy yahoo.py and yahoogroupsapi.py to c:\python27\scripts .

Open the command prompt: Start->run->CMD->Enter Enter "path" to make sure the c:\python27 is in the path correctly.

On the command line, enter "pip install requests"

You should have something like this so far:

================= Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Dennis>path PATH=C:\Python27\;C:\Python27\Scripts;C:\WINDOWS\system32;...

C:\Documents and Settings\Dennis>pip install requests Collecting requests Downloading https://.../requests-2.22.0-py2.py3-none-any.whl (57kB) 100% |UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU| 61kB 435kB/s Collecting certifi>=2017.4.17 (from requests) Downloading https:///.../certifi-2019.9.11-py2.py3-none-any.whl (154kB) 100% |UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU| 163kB 547kB/s Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests) Downloading https://.../urllib3-1.25.6-py2.py3-none-any.whl (125kB) 100% |UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU| 133kB 547kB/s Collecting idna<2.9,>=2.5 (from requests) Downloading https://.../idna-2.8-py2.py3-none-any.whl (58kB) 100% |UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU| 61kB 1.3MB/s Collecting chardet<3.1.0,>=3.0.2 (from requests) Downloading https://.../chardet-3.0.4-py2.py3-none-any.whl (133kB) 100% |UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU| 143kB 409kB/s Installing collected packages: certifi, urllib3, idna, chardet, requests Successfully installed certifi-2019.9.11 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.6 You are using pip version 18.1, however version 19.3.1 is available. You should consider upgrading via the 'python -m pip install --upgrade pip' command.

Now the part that doesn't seem to work right. On the command line, I entered: "yahoo.py -f -us UserName -pa Pass -ct "MY T COOKIE" -cy "MY Y COOKIE" mygroupname"

The "-f" is so that it will only copy everything from the group files section.

I get:

=========== logging in...

====================

So it croaks on SSL somehow. I do not have an E cookie. Not sure if this is relevant, but where is it getting the name "ExpcbHelper" from? The name of the group is 'expresspcb' not 'ExPcb'.

Anybody know how to make it work the rest of the way?

n4mwd commented 4 years ago

I don't know why the above post went all giant letters like that. When I tried to edit the post, it wouldn't let me.

n4mwd commented 4 years ago

Running a second time made it cough in a different spot. One of the files was "docs *.txt" which is an illegal file name in windows. I logged onto the group and renamed that file and it downloaded all the files successfully (I think). When I switched to photos "-i", it coughs up the following error:

[code] logging in...

IgnoredAmbience commented 4 years ago

I'm unsure what is causing the certificate verification failures for you. Initial suspicion was that Root CA Certificates weren't available for your python installation, but your first post shows the certifi package being installed, which contains them and will be used. Unsure how to progress from here, holding this open until I can think further about it.

d235j commented 4 years ago

I've also had this on Linux and macOS.

n4mwd commented 4 years ago

I think yahoo is playing games with us. I ran the exact same code and it worked flawlessly. Maybe it needs to have two attempts before it gets all its registers set right.

n4mwd commented 4 years ago

I'm getting this fairly consistently on files and photos. Run once and you get the error. Run twice and it works fine. However, on messages, it does it only when there is an attachment to download and re-running it doesn't fix it. It goes by "can't download attachment" ones without a problem, but then failed when it apparently found an attachment to download.

** Yahoo says this message has attachments, but I can't find any!

dossy commented 4 years ago

My hunch is that some of the load balancers that serve xa.yimg.com are missing their intermediate certificate chain certificates, making certificate verification fail for clients.

I found that simply retrying the request would result in a connection that would get served a proper SSL response.

IgnoredAmbience commented 4 years ago

Fixed by #44

d235j commented 4 years ago

@dossy I found that retrying would often lead to the same error happening over and over. Therefore the fix.

dossy commented 4 years ago

@d235j - yeah, it's a coin-toss - if your request gets sent back to the same worker that is missing the intermediate certificate chain, the request will keep failing. I guess I just got lucky every time, in that on retry, my request went to a different worker that presented a cert I could verify.