PaulLereverend / NextcloudDuplicateFinder

Save some space by finding your duplicate files
GNU Affero General Public License v3.0
79 stars 16 forks source link

Idea: Automate deleting found dupes #16

Open binarypickle opened 3 years ago

binarypickle commented 3 years ago

Dangerous, I know, but with enough warnings, could be helpful. Selfishly, I'd love this. The tool is great, but due to some bad planning on my part with image management, I've ended up with hundreds (if not thousands) of dupes scattered throughout the file system. So after dedicating the time to run a full scan, would be awesome to just pull a trigger and delete all but the first result for each match (or during, but that seems like it'd require more modification than just adding a button to the results ui).

lbdroid commented 3 years ago

Its probably smarter for you to use the commandline and script around it.

Something like...

sudo -u apache ./occ duplicates:find-all -u binarypickle | grep "\\\/" -B1 | grep -v "^--\|\\\/" | while read line; do rm "$line"; done

That will delete the LAST ONE of each duplicate. Run it a few times until there are no more duplicates. ** it will miss the last file, but that's just one which you can get manually.

hrenki commented 3 years ago

Just imported some pictures with lots of duplicates and solved this problem by doing the following:

  1. Get all duplicates output to a file "duplicates" php occ duplicates:find-all -u karlo -p /Fotke > duplicates

  2. Extract only those files that you want to delete (eg. delete all duplicates that are in one of folders etc.). I wanted to delete duplicates from folder "Fotke/Photos ddmmyyy" and leave a copy in "Fotke/albumName" grep -r "/Fotke/Photos " duplicates > dupdelete

  3. After that extract you can check if the files in "dupdelete" list are ok and run the this to delete them:

    • you need specify absolute path to nc-data/user/files because paths in list are relative to user files while read -r file; do rm -- "/nc-data/user/files$file"; done < dupdelete
jmporchet commented 3 years ago

sudo -u apache ./occ duplicates:find-all -u admin | grep "\\/" -B1 | grep -v "^--|\\/" | while read line; do echo "rm $line"; done

Its probably smarter for you to use the commandline and script around it.

Something like...

sudo -u apache ./occ duplicates:find-all -u binarypickle | grep "\\\/" -B1 | grep -v "^--\|\\\/" | while read line; do rm "$line"; done

That will delete the LAST ONE of each duplicate. Run it a few times until there are no more duplicates. ** it will miss the last file, but that's just one which you can get manually.

As a reference for people who are using docker like me, here's the command I had to run:

docker exec -u 82 nextcloud php occ duplicates:find-all -u admin | grep "\\\/" -B1 | grep -v "^--\|\\\/" | while read line; do docker exec -u82 nextcloud rm "/var/nc-data/admin/files$line"; done

With about 2000 duplicates it took my machine about 5 minutes, and the command doesn't output anything while it's executing. It took me a little while to figure it out, but only the occ command happens in the docker container, the rest happens outside. So In the while; do; done loop I docker exec at every iteration. There's probably a wayyy smarter way to do it but at least that served my needs.