arsenetar / dupeguru

Find duplicate files
https://dupeguru.voltaicideas.net
GNU General Public License v3.0
5.48k stars 418 forks source link

Random file is used as a hardlink source #1148

Open AlttiRi opened 1 year ago

AlttiRi commented 1 year ago

Describe the bug When dupeguru creates hardlinks for duplicate files dupeguru chooses a random file as a source to replace other duplicates with a hardlink of it.

To Reproduce

Expected behavior dupeguru should choose the file with the older (lower) btime/mtime and use it as a hardlink source for other duplicates.

Desktop (please complete the following information):

Additional context

Here is the online file explorer snapshot: https://alttiri.github.io/keep-lister/?filepath=https://alttiri.github.io/trash-files/dupe-guru-scan-1.json.gz

The original folder's files (look at the file times): image

The deduplicate result for the first folder: image (3.0.txt was used as a source for hardlinks)

The deduplicate result for the second folder: image (1.txt was used as a source for hardlinks)

The deduplicate result for the third folder: image (2.txt was used as a source for hardlinks)

It should always use 1.txt as a hardlink source, since it has the older btime/mtime.

AlttiRi commented 1 year ago

The another example.

All files have the same mtime as it's in filenames, but dupeguru suggests 3 files with the recent mtime as origin files: Screenshot

I sure it should choose as an origin file a file with the less recent mtime/btime.


If there are:

then it should use the file c.mp4 as the origin file because it have the older mtime.