IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

Crashes on non-alphanumeric folder name for files #71

Closed andrewferguson closed 4 years ago

andrewferguson commented 4 years ago

Some groups store files in non-alphanumeric folder names, and this currently causes the script to crash.

2019-10-26 11:24:33.172 BST INFO archive_files Fetching directory '∮∮♀♂表格收集$$∮♂♁★★' (1/15)
Traceback (most recent call last):
  File "./yahoo.py", line 696, in 
    archive_files(yga)
  File "./yahoo.py", line 219, in archive_files
    with Mkchdir(name):
  File "./yahoo.py", line 568, in __enter__
    os.chdir(self.d)
OSError: [Errno 2] No such file or directory: ''

This is on the wizaschool group.

andrewferguson commented 4 years ago

Here's the fileinfo.json for the group. (With .txt added to the end so GitHub is happy). fileinfo.json.txt

IgnoredAmbience commented 4 years ago

files api doesn't give unique ids to uploaded folders or files, consider use of prefixed created time, or hash of non-printable name

IgnoredAmbience commented 4 years ago

Further notes, directory names need to be reproducible just from the name string, as we don't have metadata on further traversal into the directory. Seems like appending a hash will be the best route.