Closed arete06 closed 4 years ago
I made some relevant changes yesterday. Was that backup run with a version of this fork downloaded, cloned, or pulled sometime today?
https://github.com/mikf/gallery-dl backs up likes in about the same way, except it uses the last liked_timestamp instead of links for the next "before".
If you've already tried a version from today and your likes are mostly media (images/videos/etc.), you could install gallery-dl (run pip install gallery-dl
) and try this:
$ gallery-dl "https://<blogname>.tumblr.com/likes"
$ python -c "import glob, os, re; print(len(set(re.match('[^_]+_[^_]+_([0-9]+)_.*$', os.path.basename(p)).group(1) for p in glob.glob('gallery-dl/tumblr/*/likes/*.*'))))"
The second command counts the number of unique liked posts that gallery-dl was able to find. If it's more than 1086 then I will consider this a bug.
I think there's also a chance that 324 of your liked posts have been deleted or are on dashboard-only blogs, and the "expected" count is inaccurate...
My original problem, with yours and other scripts, was that they didn't download the text posts. So I appreciate that solution but it doesn't suit my needs.
I cloned your repo again. As I said before, I have 1410 liked posts and it claims to have backed up 1086. So I scrolled until the end of my liked posts on Tumblr and compared them to the backed up ones. At the beginning, there were posts that didn't appear in Tumblr. Maybe I liked them at the time? After some posts, my first liked posts start to show up. However, they were not consistent. Some would only appear after a while (so, not ordered), some would not appear (maybe I had to scroll more).
So, assuming that all posts are backed up and having in mind that more posts appeared than they should, how can the number of backed posts be smaller?
@sldx12 You've discovered a feature of aggroskater's fork that mine is currently lacking: Sorting liked posts by the time they were liked instead of the time they were originally posted. I'll add that soon; if you want to try it now, find key=lambda x: x[0]['id']
and replace id
with liked_timestamp
, also find self.date = post['timestamp']
and replace timestamp
with liked_timestamp
.
If you don't want to wait for the API calls you can rerun the script using --prev-archives
, this would look like this:
python /path/to/tumblr_backup.py --likes --prev-archives <blogname> -O <blogname>_new
@Cebtenzzre Ok, it now seems that the scripts is downloading all my likes (even though it says that only backed up 1086 of 1410. Maybe it's only counting the downloaded images instead of all the posts?). I'll check better later and close the issue. On the topic, is it possible to have them in only one html page instead of having to choose the month (I can create a new issue if you want)? Not asking you to do it, just asking if the files are downloaded in a way that makes me able to do this.
Yeah, taking a closer look it seems that all posts are backed up. Of course I won't know for sure unless I do it manually (not doing it) so I'll trust the computer. Maybe if you find an explanation for it claiming to back up fewer posts I'll sleep better. Feel free to close this after that.
It's definitely counting posts, not images. That number would be lower than expected only if those posts are simply no longer available (not much you can do about that, manually or with a script), or you used any option that skips posts (--no-reblog
, --request
/-Q
, --tags
/-t
, --type
/-T
, --filter
/-F
).
With --posts-per-page 0
, you can at least limit the pages per month to one. The index is always monthly, probably to avoid slow page load times. Continuous-scroll page loading like Tumblr would require writing JavaScript code, which seems like more complexity than it's worth.
On my fork, you can always regenerate the index with --count 0
, so you wouldn't even have to rerun the backup if you wanted to change the index options (such as --posts-per-page
, --reverse-index
, or --reverse-month
).
I'm not using any post skipping option, so this will remain a mystery to me. Running another script which does the same also gives me around the same number of posts so yeah I'll assume that there are 400 no longer available posts.
Thanks for your help and patient, we've been in this for quite a few days!
Using the latests version I was unable to download all my blog's likes. I have something like 1410 liked posts and the script returned the following:
blogname: 1086 liked posts backed up