arsenetar / dupeguru

Find duplicate files
https://dupeguru.voltaicideas.net
GNU General Public License v3.0
5k stars 402 forks source link

Command Line Interface - Request #1098

Open TechUpDesigns opened 1 year ago

TechUpDesigns commented 1 year ago

Hi,

I really like your software. I had a request which I think your software could help if it had a command line interface.

What I would like to do is scan a directory for duplicates. The CLI would be able to accept the same parameters as the GUI. Then have your program output the CSV file of the duplicates. I don't need the CLI to delete files; I'm only looking for the CSV it creates. Right now this can all be done with the GUI. Scan files, export to CSV.

However, I would like to automate this process and feed the CSV file into another program.

Thank you.

jelabarre59 commented 1 year ago

At minimum I would want the ability to simply start the program with a couple command switches giving it the directories to compare. Perhaps something like "dupeguru --reference /path/to reference/dir --normal /path/to/other/dir". Then within the GUI I could start the scan and manipulate the results. I've been trying to clean up an old disk, but even with just TWO parent directories selected it spent an hour scanning, and I quit the application when it was at 2.5 million files.

Would be simpler if I could simply feed it subsets of directories upon starting the application, even if I have to exit and restart each time. As it is I have to navigate through the fire directories for each and every try, when instead I could be sending it sets of directories to scan by pasting in the directory paths.

The alternative would be some sort of "quick scan" dialog where I'd just paste in the reference and normal directories to a dialog box rather than having to navigate through a file manager tree just to get to where I want.

PJDude commented 1 year ago

If you are interested in basic functionality, i.e. searching for exactly the same files (without comparing images, etc.), you can try this program of mine: https://github.com/PJDude/dude CRC calculations can be stopped, working on partial results is possible (2.5M files would probably kill it too). I plan to add export to csv in the future as well.

darkzbaron commented 1 week ago

+1

jelabarre59 commented 1 week ago

@PJDude Actually I found a probable cause for the far excessive number of files the program was trying to index. Seems it was following any symlinks, and trying to index anything within the symlinked directories as well. considering a symlinked dir could have symlinks of it's own (even when they're not circular) the index goes well beyond what it should be. There actualy arent't 2.5 million actual files.