Open IgnoredAmbience opened 4 years ago
I found this peace of code in another Yahoo archiver:
import email
... some code ...
for message in msg_json['messages']:
id = message['messageId']
print "* Fetching raw message #%d of %d" % (id,count)
raw_json = None
for i in range(5):
try:
raw_json = yga.messages(id, 'raw')
break
except requests.exceptions.ReadTimeout:
print "ERROR: Read timeout, retrying"
time.sleep(HOLDOFF)
except requests.exceptions.HTTPError as err:
if err.response.status_code == 500:
print "ERROR: HTTP error %d reading the message... given up :(" % err.response.status_code
continue
if raw_json is None:
print "ERROR: given up on this message, moving on"
continue
mime = unescape_html(raw_json['rawEmail']).encode('latin_1', 'ignore')
eml = email.message_from_string(mime)
I got it from here: https://github.com/philpem/yahoo-group-archiver/blob/master/yahoo.py
I have a working version (I don't know if it is the same repository) that will actually get the eml messages.
Yes, this code was originally in this repository, however it was removed as it was causing crashes during the main archive loop. It is this code that I was referring to that should be extracted out to a new tool.
The commit that removed this functionality was cefc51bda0bdea2bf64216c8223eb0714e42f018 But it was already seriously broken at this time. It should still be working with it main issue (modifing broken encoding) at commit 22d9317c5d706147269bfd1a0ecbaead5a536824
I'm sure whatever you guys write will be infinitely superior to what I can put together, but in the mean time, I use a little AutoIt script to convert the JSON files to EML files which Thunderbird accepts as "real".
@ugcheleuce Would you like to share that AutoIt script to convert the JSON files to EML files, and better still put together a multiplatform (java or python?) one?
@Sadi58 I can use Python and Java, but I can't program in it. I wasn't sure if linking to an AutoIt script was good manners, but since you ask, it's here: http://www.leuce.com/autoit/IA_JSON_2_EML.zip (newer versions always at the same download location). It's a bit slow, unfortunately (it takes about 10 minutes to create a list of 150 000 files to process, and then takes about 1 minute per 2500 files). Only ever tested on my own computer, too (Windows 10 Home 64).
@ugcheleuce Thank you. Your script does not compete with the one here, but just complements it. :-) There's also a reference to a competing script above: https://github.com/philpem/yahoo-group-archiver It does more or less the same thing, and downloads message files in eml format instead of json. It's good to have such different options and alternatives. ;-)
I wrote a JSON to EML bulk conversion tool as a windows app that accepts the JSON files generated here and writes them into an "/EML" directory. It currently only writes raw EML files straight from the JSON file because I haven't had time to finish it. This means it currently does not reattach photos etc. I have not had time finish it due to too many hospital visits and other stuff, however, if all you want is a raw eml file,with no attachments, then it will work for that as is. The goal is to eventually convert all the files into a single XML file using BBCode, but that part isn't working right now. I can post it to github if anyone is interested.
I believe @PaulWebster is working on this functionality at present.
I don't know anything about his work. Mine only writes raw eml files at the moment. Other functionality is pending. https://github.com/n4mwd/YahooEml
Reintroduce conversion from json messages/attachments/photos to eml files. The bulk of the code is in the commit history, just needs extracting out to a separate tool.