csaftoiu / yahoo-groups-backup

A python script to backup the contents of private Yahoo! groups.
The Unlicense
37 stars 18 forks source link

site_dump does not handle root level files correctly #56

Open jimdinunzio opened 4 years ago

jimdinunzio commented 4 years ago

Hi, I noticed on two different site dumps that files at the root level all are all copies of a single file. The files in folders below root appear to be fine.

Name Size Date Modified Intro to AVR programming/ 11/14/19, 6:38:58 PM PID presentation/ 11/14/19, 6:38:58 PM robot builder issues/ 11/14/19, 6:38:58 PM sample line follow videos/ 11/14/19, 6:38:58 PM file1 37.1 kB 11/14/19, 6:38:58 PM file2 37.1 kB 11/14/19, 6:38:58 PM file3 37.1 kB 11/14/19, 6:38:58 PM file4 37.1 kB 11/14/19, 6:38:58 PM file5 37.1 kB 11/14/19, 6:38:58 PM file6 37.1 kB 11/14/19, 6:38:58 PM file7 37.1 kB 11/14/19, 6:38:58 PM

Jim

dogfeathers commented 4 years ago

Same thing is happening with the files in every folder, not just at root level. The issue is in yield_walk_files() in scraper.py. In the "for data in data_files" loop, the "el" variable is not being set in the loop, it's inherited from above. Get rid of that loop and move its lines up into the "if data['fileType'] == 'f':" block. No need for the data_files list. You can similarly get rid of the data_dirs list, but that one isn't hurting anything (but not helping anything either).