elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
448 stars 67 forks source link

Add feature to skip non-image file extensions #69

Closed UplandsDynamic closed 1 year ago

UplandsDynamic commented 1 year ago

Add a feature to skip any file with an extension that indicates it is not within a set of image filetypes that are currently supported by Pillow.

UplandsDynamic commented 1 year ago

Hi @elisemercury,

This is the pull request for the feature we were speaking about in https://github.com/elisemercury/Duplicate-Image-Finder/issues/60.

The hard-coded tuple of filetypes currently supported by Pillow is correct to the best of my knowledge, but as you say, it's not dynamically updated.

On the other hand, it'd be trivial to update the tuple if and when new filetypes are supported (or indeed, removed from support). If the user has logging turned on, they'd be alerted to the skipped files, thus would be in a position to raise an issue if something they thought should have been accepted was not.

Cheers, Dan

elisemercury commented 1 year ago

Hi @UplandsDynamic,

Thanks a lot for this! I think this is a great idea and can definitely be useful as feature improvement. I like the idea of having the option to turn limit_extensions on and off so that users have the option to choose whether they want to use it.

Again, thanks a million for your contributions!

All the best, Elise

UplandsDynamic commented 1 year ago

HI @elisemercury, I'm glad it was useful! Since I made the PR last night, I realised there's a slightly more efficient way of grabbing the extensions than the RegEx I used - using the pathlib module. So at some point soon I'll sync over your updated main, make the change, test and send a PR for the tweaked code. Cheers, Dan

elisemercury commented 1 year ago

Hi @UplandsDynamic,

Yes indeed, I thought the same. I'm currently working on an alternative to regex to make it more efficient. The update should go live soon. Thanks again!

Cheers, Elise

UplandsDynamic commented 1 year ago

@elisemercury Ah ok, I was just working on that now, but if you're already on the case I'll abandon sticks. I was thinking along the lines of ext = Path(f).suffix.lower().strip("."), which I've tested and does the job. Out of interest, I was going to do an experiment to test the execution times between the two approaches, but just realised it's gone 10pm and I've not eaten dinner yet, so maybe another time 😆

elisemercury commented 1 year ago

@UplandsDynamic, I implemented it with os.path.splitext(f)[1] as I assumed it would be more accurate than 'manually' stripping of the end of the path. This meant including a . in front of the valid_extensions list, then it worked perfectly fine.

Again, thanks again for your contribution! The list seems very complete, so it seems like a very good solution to increase the performance a bit (happy to see the test results if you decide to do some!). It might make sense to set this parameter to default True in the future - let's see what the community says. :slightly_smiling_face:

Good night and thanks again, Elise