Closed rominatrix closed 5 years ago
Perhaps create a fresh backup with the patch from this comment applied.
Perhaps create a fresh backup with the patch from this comment applied.
I've just checked and I had applied that patch already before running it.
Hm, then maybe the patch is at fault. @ecm-pushbx do you see anything obvious that could be causing this?
@rominatrix Please create a small test blog with entries with tags that are causing your problems.
@ecm-pushbx I have probably thousands of different tags, mostly just keysmashing so I have no idea which one is causing this error. I can generate a small test blog and make different types of tags that are not-ascii but still could not replicate it. Not sure if I can debug it, maybe it could be easier if I could just skip the parts of the script that downloads images etc and leave only the tags part? But I would need some help with that.
It might be due to tag length and the length of your path of the directory into which you're storing. There are some maximums there, around 255 bytes I think.
Not sure if I can debug it, maybe it could be easier if I could just skip the parts of the script that downloads images etc and leave only the tags part? But I would need some help with that.
You'll have to ask @bbolli or someone else more familiar with the operation of the script. I don't know either, I'm just hacking around some.
You can always regenerate the complete indices by backing up just one post with -n1 --tag-index
. To build the index (both of them), the posts are read from the disk and the tags parsed from the file contents. Otherwise, incremental backup wouldn't create the whole index.
@bbolli thank you, i will try that!
@bbolli thank you, i will try that!
Did it work? I'm dealing with a similar problem and unfortunately not familiar enough to apply the patch on my own. I started running a backup overnight with a version of tumblr-utils I downloaded last week (a mistake on my part).
If I've already successfully downloaded the blog locally, would I need to download the newest version of the code and start it over entirely/download from tumblr fresh, or do I need to move to downloaded contents into the new version folder and then re-run it to regenerate the indices as described above? (Or maybe move the new code into the existing previous folder, instead?) I'm hoping I can solve this locally rather than re-downloading 50GB of content. I'm just hesitant to restart the process without feeling confident about what I'm doing with the command line.
Thank you everyone for all your hard work on this.
Probably fixed by #140
After almost 2 days, tumblr_backup.py finished generating the backup of my 87k posts blog (resulting in a 141GB directory). When I tried opening my "tag list index" the index.html file (that is created inside the "tags" directory) was not there. Also I've noticed that a remarkable small amount of tags (only 104) were saved in the "tags" directory.
Before making this huge backup, I've tested it first with the "--no-reblog" flag (resulting in a 16GB directory) and the tag list index was generated properly.
Now, I'm assuming the problem (maybe that's not the case) that it has to do with the fact that a lot of my tags have non ascii characters, eg the last tag dir that was saved is as such:
%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21
I didn't save the STDERR to a file but instead it ended up on my screen and the last error I have there is like this:
IOError: [Errno 2] No such file or directory: 'D:\\backup\\posts\\everything\\tags\\%21%21%21%21%21%21%21%21%21%21%21%21%21%21 %21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%2 1%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21%21\\archive\\2014-08-p1.html'
(btw, those tags are correct, it's just a bunch of !!!!)I'm not sure what could be the problem. I have no way of removing said tags because in 87k posts there are a lot of tags like that also I don't know what is causing this. I have saved all the json files if that helps. I'd be willing to run the script again only maybe to re-generate the tag list, if that's possible (maybe commenting the part of the code where it saves posts?)
Thanks in advance.