Particle1904 / DatasetHelpers

Dataset Helper program to automatically select, re scale and tag Datasets (composed of image and text) for Machine Learning training.
MIT License
156 stars 9 forks source link

[Feature Request] Image similarity/duplicate #31

Closed DEX-1101 closed 1 month ago

DEX-1101 commented 2 months ago

sometime while scrapping stuff (danbooru mostly), there might be 2 post of same pic but with different resolution so it will be good if can detect those and automatically remove one of them and also can be removed manually by choosing one among those similar stuff it will be very helpful if that possible, thanks !

O-J1 commented 1 month ago

sometime while scrapping stuff (danbooru mostly), there might be 2 post of same pic but with different resolution so it will be good if can detect those and automatically remove one of them and also can be removed manually by choosing one among those similar stuff it will be very helpful if that possible, thanks !

Look into either a simple phash script or alternatively the easy mode is DupeGuru.

Particle1904 commented 1 month ago

Its a planned feature but its a bit more complicated than it sounds. At the moment, I'm using AllDup for it.