Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.3k stars 211 forks source link

No-dupes doesn't work? #732

Open instaloadererror opened 1 year ago

instaloadererror commented 1 year ago

Since Reddit limits to 1000 I'm downloading based on top from all, year, and month. I use the --no-dupes but it seems useless. I get three copies of the same file for a lot of submissions.

Serene-Arc commented 1 year ago

This is impossible to diagnose without more information. Please read and fill out the bug report template BEFORE filing a report.

OMEGARAZER commented 1 year ago

Going off the info you posted here you're right that using the upvotes in the file name means it won't see the previous one as already being a file.

As for no-dupes, it only considers the current run and then discards the hashes. For your use case you're better off including search-existing (if you're not time sensitive as it will hash all files in your destination folder) or your best bet is to include exclude-id-file with a list of ids that have been downloaded already. You can generate the list with a script in the scripts folder of the repository.