adrianlopezroche / fdupes

FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
2.42k stars 186 forks source link

Feature : save a md5sum/tree file for future compare #19

Open Georges760 opened 9 years ago

Georges760 commented 9 years ago

It would be good to save the hash/parse/analyze information of a specific fdupes run, in order to compare later this "virtual"files tree with a real file tree.

For example, I have a huge photo database, I can run a long fdupes analyze and store the informations in a file. Then if later in my old HDD I found some photos, i could just run fdupes on these photo with an import of the "virtual" huge database to compare from...

AnrDaemon commented 8 years ago

I don't see a problem. Let's take current task I'm facing as example. I have 12-13 years worth of company's rather chaotic development to sort through, kill dupes and otherwise optimize storage. For each given file, there's usually a whole directory of dupes that I go to, check, compare and kill at once. After that, the whole list of files in the fdupes' log become irrelevant. From this point, I have two choices.

  1. Run fdupes again to sift through the same 120-something gigabytes of files, or
  2. Try to filter out the files I just deleted from the log.

The first choice is, of course, preferred. But it takes hell of a lot of time to complete. It would be most convenient if it would (optionally) be able to save the hashes for future reference, or consume the md5sum output for the same purpose.

AnrDaemon commented 8 years ago

I think you're overcomplicating it. Yes, a full-featured, fool-proof solution would require preservation of file size at least, as well as hash, but think it through. You will scan the suspected file to ensure bitvise duplication anyway. You could as well recalculate the hash at the same time. If hash wouldn't match the one stored in the provided list, you have all the rights to bark at the user to refresh the list first. It is up to the user's discretion, how to interpret the message and what action to take, if any. For me? It wouldn't ever happen. 99% of the files are photos, which are so highly unlikely to change that I'm only re-comparing before deletion due to my progressive paranoia.