MD5 hash - Githubissues

mindjek07 commented 2 years ago

Is your feature request related to a problem? Please describe. Duplicate images

Describe the solution you'd like Store MD5 hash data of every image

ghost commented 1 year ago

I cant believe this is not what the author meant by "avoid duplicates". I ended up with tones of duplicated images simply cause they have different titles. This makes the program kind of useless for me. I hope you can add this in the future

MalloyDelacroix commented 1 year ago

Avoid duplicates actually works by storing downloaded URLs and not re-downloading content at a URL that has previously been downloaded. It has nothing to do with the title.

This issue is not as simple as it appears. Most image/video host sites do not make an MD5 hash, or any hash for that matter, available before content is downloaded. So the content must be downloaded, then hashed, then compared to previously downloaded and hashed content, then deleted if it is found to be a duplicate. This is a feature that I plan to implement in future versions, but it is far from the ideal duplicate avoidance that most users would expect to be possible.

ghost commented 1 year ago

I used to use https://github.com/shadowmoose/RedditDownloader and i'm not sure it downloads the image to know if they are in fact duplicates. Maybe it does...

Edit : Actually it does you're correct https://github.com/shadowmoose/RedditDownloader/blob/62a98c658b5759a2acdbbfa7a58cd6e842aaf71f/redditdownloader/processing/post_processing.py#L17

MalloyDelacroix / DownloaderForReddit

MD5 hash #293