agross / immich-duplicates

Find image and video duplicates in Immich.
129 stars 5 forks source link

What to do after deleting duplicates? #12

Closed jmerifjKriwe closed 8 months ago

jmerifjKriwe commented 8 months ago

Hi,

maybe my question is a little bit strange but I really have no clue.

So I created the dupes.db, converted to dupes.json and deleted all duplicates via the duplicates-browser. So far, so good. But if I run the immich_duplicates and immich_dupliactes_grouper again, the .db and .json are the same size as before. I would expect that at least the json should be smaller (more or less zero). If I know copy the content of the json file to duplicates_browser it shows me 1234 groups (1234 is an example) and starts to count down the numbers. I guess the browser recognizes that the duplicate image has already been deleted. But why have the old duplicates been detected? Do I miss a very last step? Maybe inside immich itself?

Best regards

agross commented 8 months ago

Hi,

The best way to verify if your duplicates got removed is to check the Immich trash.

I think findimagedupes will consider files that are present on the file system, but it will not remove those from the database that have been deleted. The files you removed since the first scan will just be retained in the database. So it makes sense that your "old" duplicates are still stored in the database and hence are also converted to JSON.

At any point during your duplicate removal process, you can remove the dupes.db file and rerun findimagedupes.

I guess the browser recognizes that the duplicate image has already been deleted.

Exactly.

There's also the case that an asset appears in 2 or more groups. If you removed that asset from the first group, it does not necessarily make sense to display it again in the later group if that group then is only 1 asset in size. What you perhaps consider to be a "countdown" is just this process of checking if the asset exists (answer: no), reevaluating the new group size (now <= 1), and removing that group (count - 1).

Do I miss a very last step?

As I said, you don't miss a step. Restart from scratch without a dupes.db file and an empty trash and you should see no duplicates (except for those that you explicitly ignored).

HTH,

Alex

agross commented 8 months ago

In the readme, the --prune parameter is passed to findimagedupes:

Remove fingerprint data for images that do not exist anymore

If you enable the trash feature in Immich, deleted assets will be considered deleted by Immich itself. But their thumbnails (which is what findimagedupes processes) will only be deleted after Immich removes the asset (after 30 days).

jmerifjKriwe commented 8 months ago

Thanks a lot :-) I wasn't aware that Immich has a trash features. After emptying the trash (setting to 0) everything is fine. Now after deleting the easy duplicates, I can start another scan with a hight setting for Hamming distance. Perfect.

Great work!