Open needsupport opened 1 year ago
The main thing that I (not as a developer of this project) can see would need to be addressed is: how would you choose which file to keep and which to delete? Date modified, name, folder, etc.? I had been thinking about suggesting this, but I wanted to think thru these options more before before I said anything.
We are working on it, we have one interesting idea =)
We are working on it, we have one interesting idea =)
Sounds great! I'm looking forward to this! 😄
Also, just a note for future reference, this is probably a duplicate of #75
Before we think about a complex way to determine which of the duplicates are to be deleted automatically, it would already be helpful if:
Also an option to select "safe folder(s)": i.e. if you are comparing folders A, B, C, marking A as safe won't let you delete any files from that folder or it's subfolders.
This is a necessity for me. I like the idea of a safe directory or directories. I suggested an alternate layout in Issue #75 to display and select duplicates for deletion
A possible auto-delete option might be to delete all the smaller sized files from each group (and preserve only the one with the largest size). It helps especially if you have imported images from google and you also have the originals which are bigger in size.
An interactive threshold slider would be also nice to fine tune it before auto-deletion: if I see the highest differences in a few groups (in the ones with the largest differences) then I could fine tune the threshold before auto deletion to my taste (again the goal would be to preserve just one image per group).
Thank you for the great work guys, you are great, keep on going!
I was about to put in a similar request as I indeed have thousands of match-groups to work thru and when set to 99% threshold there don't seem to be that many false matches. At least in my use case keeping the largest of the files seems like a fine way to determine which files to delete. I am really looking forward to this functionality.
Just registered to let you know: Please make this happen. I'm currently looking over 5000 matching groups. I don't want to :)
+1
@bigcat88 any help's needed?
@bigcat88 any help's needed?
We are currently working full-time(even more then full-time) on project called "Application Ecosystem V2"(their repos currently in this cloud-py-api org) for the Nextcloud and unfortunately, we don't have the time to implement a specific feature request at the moment. However, we would be more than happy to accept a pull request if someone from the community would like to contribute and implement the feature themselves. Alternatively, if you can wait until we finish working(minimum time required one month, but probably it will be 2 months) on the "Application Ecosystem V2" we will be able to return to mediadc and consider implementing your requested feature. If all goes well we will start rewriting MediaDC python part in parallel when will finish design state and publish docs for the AEv2. Thank you for understanding.
@bigcat88 In that case there is no much sense to support you guys in implementing some workarounds, please let us only know about the progress, it's really useful feature, thx
@bigcat88 how is the thing with Ecosystem V2? The feature of removing all duplicates at once is really interesting :) please go back to it
@mniami Fast, but slower than I expected. If specifically on MediaDC, then I hope that it will be easy for me to add support for applications written with AppEcosystem to AppStore and in 2-3 weeks it will be possible to start transferring MediaDC to AppEcosystem.
But it's still not accurate, so far everything is going well, but there can always be some kind of obstacle.
@bigcat88 thanks for letting us know
Checking in!
I made a python script that uses the json export meanwhile :
https://github.com/tbarbette/mediadc-massdelete/tree/main
It uses the size of files first to keep the biggest one, then a few heuristics in the filenames you can give if the size match (for instance, delete everything with "whatsapp" in the path, and prefer not to delete anything with DCIM in the path). I also found the option --different-path-only
useful to avoid deleting pictures from a burst, it will not delete files in the same folder. In general you remove duplicate because a mess was created by different folders with the some similar pictures, smaller version created by whatsapp that were imported, thumbnails, .... Hope it helps.
another heuristic/option that would be nice: ability to delete all duplicates only if the file sizes are the same. sometimes even the 100% matching setting doesn't actually identify 100% of the visual match but exact same file size should be an obvious indicator of a duplicate along with that
I have 72,000 duplicates so I will be eagerly anticipating this feature
Hey, guys, size and checksum would be appropriate for the comparison.
śr., 9 paź 2024, 21:56 użytkownik Brandon der Blätter < @.***> napisał:
another heuristic/option that would be nice: ability to delete all duplicates only if the file sizes are the same. sometimes even the 100% matching setting doesn't actually identify 100% of the visual match but exact same file size should be an obvious indicator of a duplicate along with that
— Reply to this email directly, view it on GitHub https://github.com/cloud-py-api/mediadc/issues/94#issuecomment-2403319271, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADM5ZXJDAV2LVWGHNKQRFMDZ2WC63AVCNFSM6AAAAABPVIWWQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBTGMYTSMRXGE . You are receiving this because you were mentioned.Message ID: @.***>
You can always check the creation date as well.
czw., 10 paź 2024, 08:31 użytkownik Damian Szczepański @.***> napisał:
Hey, guys, size and checksum would be appropriate for the comparison.
śr., 9 paź 2024, 21:56 użytkownik Brandon der Blätter < @.***> napisał:
another heuristic/option that would be nice: ability to delete all duplicates only if the file sizes are the same. sometimes even the 100% matching setting doesn't actually identify 100% of the visual match but exact same file size should be an obvious indicator of a duplicate along with that
— Reply to this email directly, view it on GitHub https://github.com/cloud-py-api/mediadc/issues/94#issuecomment-2403319271, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADM5ZXJDAV2LVWGHNKQRFMDZ2WC63AVCNFSM6AAAAABPVIWWQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBTGMYTSMRXGE . You are receiving this because you were mentioned.Message ID: @.***>
Description I have 3000 duplicates. deleting them one by one is going to take forever. Can you add a delete all duplicates for this task button ? Example