4pr0n / ripme

Downloads albums in bulk
MIT License
918 stars 204 forks source link

Implementing "after" logic when ripping reddit-stuff #431

Open ScuttleSE opened 7 years ago

ScuttleSE commented 7 years ago

Right now, if I run java -jar ripme.jar -d -t 25 -u "https://www.reddit.com/user/userid"

ripme will run through the userid and download all the submissions, the ones already downloaded will be skipped, but ripme will still try to download them.

If ripme could remember what the last submission it downloaded was, the next time I ripped the same user, ripme could resume where it last stopped.

The reddit API has the after-tag which you could easily save in a textfile, then the next time the userid is ripped, read the textfile and resume from there.

This would also solve the "problem" of deleteing unwanted files. Since ripme will resume from the last submission it downloaded last rip, it will not re-download old files.

Also, if you want to re-download a user, just delete the textfile with the after-tag

metaprime commented 7 years ago

I have been considering this as well. One concern with the "if you want to re-download, just do XYZ" is that sort of thing gets to be pretty terrible when there's a different way for each kind of album. I'd like to attach this to some global setting for all rerip-related things. There's a number of issues open related to improving re-rip scenarios. The general solution is to save a list of links to actual image files that have already been downloaded and not re-download those. To optimize for certain sites like reddit, we could have additional logic for most recently ripped.

Note: the reddit ripper actually starts at most recent and works backward. Wouldn't necessarily be required to filter with the after-tag if the link of the most recently ripped post were saved, but that would certainly help.

Also, I want to make sure we continue to support the scenario where you had to cancel a rip when it was partially complete for some reason, but it was still working backwards. You'd still want to continue ripping the stuff you hadn't yet ripped that was older when you resumed. So the logic for saving and resuming based on the most recent link only should only be enabled if it is the case that the rip previously completed the entire rip completely and successfully (modulo 404s).