bbolli / tumblr-utils

Utilities for dealing with Tumblr blogs, Tumblr backup
GNU General Public License v3.0
667 stars 124 forks source link

Efficient periodical backup. #205

Open indrakaw opened 5 years ago

indrakaw commented 5 years ago

Please take a look:

https://drive.google.com/drive/folders/16fMyKcBfo5mtLr-jbZZEj4AovM1SaAB9

The case is I'm backing up a Tumblr blog periodically, not incremental. Eg.

tumblr_backup.py -j -I i --save-video-tumblr --save-audio -p ${T_YEAR} -O ${T_NAME}-${T_YEAR} ${T_NAME}

This will produce something like staff-2004/, staff-2005/, etc.

The problem is, It starts from begin to the latest. Eg, I just want to download a blog on period 2006, but it will scan from 2004 to 2016, then download that period. Imagine, the blog has 100,000+ posts and is 8 years old.

bbolli commented 5 years ago

There's nothing I can do about this. The API only allows sequential access, so scanning is needed.

cebtenzzre commented 5 years ago

Sequential access only? Isn't there a before parameter, and a potentially arbitrary offset parameter (for binary searching)?

indrakaw commented 5 years ago

@Cebtenzzreep That what I meant. It would faster than way.

Imagine if you have to wait for tumblr_backup to index whole posts on a blog that over 8 years old. Downloading posts from specified year would be a pain. Especially if what you were doing is backing up the posts sperated by years: you have to do start over, start over again, and over again.

indrakaw commented 5 years ago

I requested the same thing, and they done it. https://github.com/mikf/gallery-dl/issues/337

I haven't archived the goal since gallery-dl has limited API than tumblr_backup.py, I have no idea why?