adrianlopezroche / fdupes

FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
2.42k stars 186 forks source link

fdupes: option to sort by size #44

Open sandrotosi opened 8 years ago

sandrotosi commented 8 years ago

From @sandrotosi on December 20, 2015 14:4

From matrixhasu on October 08, 2009 21:58:52

Debian bug #383962 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=383962 I would like an option to sort the list of duplicates by file size (both ascending and descending). This would be especially useful for the interactive mode, but it might also be useful for the listing mode.

Original issue: http://code.google.com/p/fdupes/issues/detail?id=3

Copied from original issue: sandrotosi/fdupes-issues#3

sandrotosi commented 8 years ago

From gwe...@gmail.com on February 04, 2014 17:35:41

This doesn't even have to be a hard feature. It could be implemented easily with a small change to the output format, nothing more.

For example, here's some fdupes output with --size:

648 bytes each:
./2014-01-15/javascript/jshomepage.js
./2013-12-26/javascript/jshomepage.js

28951 bytes each:
./2014-01-15/javascript/jsencryption.js
./2013-12-26/javascript/jsencryption.js

3014 bytes each:
./2014-01-15/javascript/jsrentblackbox.js
./2013-12-26/javascript/jsrentblackbox.js

This could be parsed and sorted by another script, but with difficulty. At least, I don't see any obvious one-liner shell pipeline which could do it.

However! If the newlines are deleted, so that the output instead looks like this:

648 bytes each: ./2014-01-15/javascript/jshomepage.js ./2013-12-26/javascript/jshomepage.js

28951 bytes each: ./2014-01-15/javascript/jsencryption.js ./2013-12-26/javascript/jsencryption.js

3014 bytes each: ./2014-01-15/javascript/jsrentblackbox.js ./2013-12-26/javascript/jsrentblackbox.js

then there is immediately an easy shell pipeline: fdupes dir/ --size | sort --general-numeric-sort:

648 bytes each: ./2014-01-15/javascript/jshomepage.js ./2013-12-26/javascript/jshomepage.js
3014 bytes each: ./2014-01-15/javascript/jsrentblackbox.js ./2013-12-26/javascript/jsrentblackbox.js
28951 bytes each: ./2014-01-15/javascript/jsencryption.js ./2013-12-26/javascript/jsencryption.js

Don't like the preceding whitespace? Toss in a `| uniq' to squeeze the blank lines.

(Of course, once you've made it this far, it might occur to you that one could delete the newlines, but it's not obvious how to group each set of files without some sort of context... At least, I can't figure out a reasonable sed invocation, so while it's doable somehow, most users certainly can't figure it out.)

malkuh commented 8 years ago

I also wanted this feature and implemented it within the fdupes c-code using and adapted mergesort-algorithm for linked lists from geeksforgeeks because in fdupes.c the files are stored in a linked list. I guess if it's wanted then I can beautify it, add a command line option for it and create a pull request. Anyone interested?

malkuh commented 8 years ago

I just went ahead and created a pull request.

Harvie commented 7 years ago

Wow! This can be super usefull if you need to quickly free some space by deleting the biggest of duplicates.

Harvie commented 7 years ago

BTW if you need some workaround... This should work unless your filenames contain string "B@R@E@A@K"

fdupes -rnS . | sed -e 's/^$/B@R@E@A@K/g' | tr '\n' '\0' | sed -e 's/B@R@E@A@K\x00/\n/g' | sort -rn | tr '\0' '\n' | tee fdupes.txt
IvanTurgenev commented 7 years ago

What happened? nothing ? :(

nodecentral commented 4 years ago

Hi, is there a way to do this today, or is it still being considered ?

adrianlopezroche commented 4 years ago

There's currently no way to do this. For those like Tomas who are looking to delete only files above a certain size, the new --minsize option may prove useful.

On Thu, Jan 9, 2020 at 4:07 PM Node Central notifications@github.com wrote:

Hi, is there a way to do this today, or is it still being considered ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adrianlopezroche/fdupes/issues/44?email_source=notifications&email_token=ABPQT7KGIKTEIOGPEOUJKILQ457YZA5CNFSM4BXDKPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRTM3Q#issuecomment-572733038, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPQT7POCJ2ZBJK5OQXWA5TQ457YZANCNFSM4BXDKPSQ .