agross / immich-duplicates

Find image and video duplicates in Immich.
127 stars 5 forks source link

Step 9: Counts down all the groups without displaying any photos #20

Closed starbuck93 closed 2 months ago

starbuck93 commented 4 months ago

clicked enter while not in the text box and it submitted... hold on let me type something up

Hey agross, I've been trying to get this to work for a couple of weeks now (since the API changed) and I've had a small success today but also I'm getting some errors in step 9. I have over 3000 duplicates (thanks Google Takeout). My issue is that when I paste dupes.json into the text box and click OK, the Groups number very quickly runs down to zero. This is an example of an error message in the network tab in the browser console.

changing out the UUID for a bunch of zeros

GET http:// LAN IP Address:8081/api/asset/00000000-0000-0000-0000-562a40c9bd70

{
    "message": "Not found or no asset.read access",
    "error": "Bad Request",
    "statusCode": 400
}

Then the next line says something like

{
    "id": "00000000-0000-0000-0000-562a40c9bd70",
    "name": "",
    "birthDate": null,
    "thumbnailPath": "upload/thumbs/00000000-0000-0000-0000-e150d52cb601/00000000-0000-0000-0000-562a40c9bd70.jpeg",
    "isHidden": false
}

I've also seen some no person.read access errors, too.

Somehow, I was able to see 1 duplicate and make a decision on which one to keep, but only 1 out of the over 3000.

Thanks!

agross commented 4 months ago

Please have a look at your browser's developer tools (and the Console section there).

The behavior you see should only happen if you e.g. have a group of duplicate assets pasted during setup, but some of those have been removed intermittently resulting in a group with just a single asset.

starbuck93 commented 4 months ago

That sounds like I should probably re-run the dups.json process? (still finishing up my issue, but I suspect you're correct)

agross commented 4 months ago

You could try rerunning dupes. Only you know whether assets were deleted since the last run! If you didn't then this could be a bug, or a breaking change in Immich's API or an issue with your setup.

starbuck93 commented 4 months ago

I don't think I manually deleted any, but my suspicion is changing from the old file structure to the new one (Splitting generated content into a separate folder). I'm re-running dupes.db/json right now, and I'll update later.

agross commented 4 months ago

I don't know whether you are a developer, but what you see is a side-effect of this code (also the no person.read access messages in the log):

https://github.com/agross/immich-duplicates/blob/900cda2aa5a191292da3fe1a58930dd5ef4c508d/src/components/DuplicateGroup.vue#L260-L288

The idea is this per duplicate group (set of duplicate assets):

  1. For each asset, attempt to load its metadata and album info - this will fail if the asset does not exist any more or it received a new ID (by splitting, no idea?)
  2. If the metadata indicates that the asset is trashed, kick it out of the group
  3. Otherwise add the asset ID to the list of successfully loaded assets
  4. If metadata or album info loading errors out check if the asset was a person's headshot image (for Immich's "People" feature) and if so ignore the whole group (i.e. "count down")
  5. After all asset IDs of the group have been processed, only consider it further if the number of successfully loaded assets is 2 or more - otherwise ignore the whole group (i.e. "count down")
starbuck93 commented 4 months ago

Thank you for that, that's a good explanation.

I created a new dupes.json, around 3,500 groups, and I was able to do 1 comparison between 5 photos and choose the highest quality photo, then it immediately starts counting down, probably 100 per second before I stop it. I'm not sure what happened, I'm getting a lot of asset.read and person.read access errors. I'll have some time tomorrow to dig into this again.

agross commented 4 months ago

Just a guess: Do you have multiple user accounts in Immich and your API key is for a different user than the one with the 3,5k dupes?

agross commented 4 months ago

Please docker pull ghcr.io/agross/immich-duplicates-browser:latest. It will no longer ignore the group if there have been <2 assets left because of load errors. At least it'll stop counting down for you and you will be able to inspect the error messages, copy the image ID (GUID), and perhaps have a look at the database.

starbuck93 commented 4 months ago

You're correct, I do have more than one user and while I was pretty sure I had the user IDs correct, I went ahead and created an API key for my wife's account and pasted it in. It did actually pull up a duplicate group for me and when I clicked Keep Best, it had a 400 error:

{
    "message": "Not found or no asset.delete access",
    "error": "Bad Request",
    "statusCode": 400
}

So I'm pretty sure I had the right API key.

Let me pull the latest iamge and run it real quick.

starbuck93 commented 4 months ago

The latest image ran with only ~12,000 errors in the dev console this time! /s Most of the errors were the asset.read and person.read error. Not found or no asset.read access But I was able to make decisions on about 20 groups of dupes, which I suppose is progress. I did see several new error messages in the window like this: SCR-20240413-gwtc

agross commented 4 months ago

Ah, wonderful. id must be a UUID is very likely due to a change in recent Immich versions where the thumbnail file name is not just <asset ID>.jpeg but now <asset ID>-preview.jpeg. I also have several thumbnails with the suffix and most without the suffix.

Please docker pull ghcr.io/agross/immich-duplicates-grouper:latest and rerun the grouping which generates the JSON file. It would be great if you could check if the JSON file contains the term preview(grep preview dupes.json) before pasting it to the duplicate browser.

starbuck93 commented 4 months ago

The JSON file does not contain the term preview, is that OK? (while the old JSON did contain 46 previews)

agross commented 4 months ago

Yes, "preview" should not be included in the file. The file contain UUIDs only.

You should also run the duplicate detection not for all thumbnails but only the ones belonging to the user account you created the API key for (by specifying the thumbnail subdirectory). This should get rid of any asset load errors that are caused by the API key not matching the user account that owns the asset.

starbuck93 commented 4 months ago

I've confirmed my user ID below matches a photo I recently uploaded from my account, this is the dupes.db process I ran:

docker container run \
--rm \
--volume /mnt/user/immich/pictures/thumbs/:/thumbs/ \
--volume "$PWD:/output/" \
ghcr.io/agross/immich-duplicates-findimagedupes \
--prune \
--fingerprints /output/dupes.db \
--recurse \
--no-compare \
--exclude '\.webp$' \
/thumbs/bef3720d-9670-4516-b9e8-e150d52cb601/
starbuck93 commented 4 months ago

I did a dump of my db so I can just search for some of these UUIDs that are failing in the console. The UUIDs that are failing on both the /asset and the /person API don't seem to exist in my database.... So I'm confused about that.

A lot of UUIDs will return a "400 Bad Request" on /asset and return 304 on the /person API. I guess they are just thumbnails of people?

agross commented 4 months ago

I did a dump of my db so I can just search for some of these UUIDs that are failing in the console. The UUIDs that are failing on both the /asset and the /person API don't seem to exist in my database.... So I'm confused about that.

Hm, I'm not sure why you would have thumbnails for assets that do not exist. You could have a look at the respective thumbnail JPEGs (potentially appending -preview to the file name). You can also try to clean your thumbnail directory and regenerate the thumbnails using Immich.

starbuck93 commented 4 months ago

That's a good idea. I may just figure out the best way to regenerate thumbs. I do have two thumbs directories, from before the migration I mentioned a few posts above. I ran the migration in Immich through the Admin Jobs tab, but there are still a ton of files scattered around. Immich seems to work just fine, though.

root@Tower:/mnt/user/immich/pictures# tree -L 2
.
├── user-2-9a99f2ff960f
│   ├── 2011
│   ├── ... (a lot more dirs)
│   ├── 2023
│   ├── encoded-video
│   ├── original
│   ├── profile
│   └── thumb
├── my-user-e150d52cb601
│   ├── 1970
│   ├── ... (way more dirs)
│   ├── 2023
│   ├── encoded-video
│   ├── original
│   ├── profile
│   └── thumb (contains 91695 files and dirs)
├── user-3-414a25736e08
│   ├── 2023
│   ├── original
│   └── thumb
├── encoded-video
│   ├── user-2-9a99f2ff960f
│   └── my-user-e150d52cb601
├── library
│   ├── user-2-9a99f2ff960f
│   ├── my-user-e150d52cb601
│   └── user-3-414a25736e08
├── profile
│   └── my-user-e150d52cb601
├── thumbs
│   ├── user-2-9a99f2ff960f
│   └── my-user-e150d52cb601 (contains 13272 files and dirs)
└── upload
    ├── user-2-9a99f2ff960f
    └── my-user-e150d52cb601

66 directories
agross commented 4 months ago

This is how it looks on my machine. Quite different!

$ tree -d -L 3 --prune
.
├── encoded-video
│   ├── 0fc60725-0009-440b-a9c1-1587d8d6cbcc
│   │   ├── 00
...
│   │   └── ff
│   └── ad2055bf-3ce4-4bc0-9884-87707fa0ee04
│       ├── 00
...
│       └── ff
├── library
│   ├── agross
│   │   ├── 2007
│   │   ├── 2007-06-10 Geocaching Suprise
...
│       └── 2023
├── profile
│   └── ad2055bf-3ce4-4bc0-9884-87707fa0ee04
├── thumbs
│   ├── 0fc60725-0009-440b-a9c1-1587d8d6cbcc
│   │   ├── 00
...
│       └── ff
└── upload
    ├── 0fc60725-0009-440b-a9c1-1587d8d6cbcc
    │   ├── 00
        └── ff

1691 directories
starbuck93 commented 4 months ago

Yup... I'm going to attempt to fix this, then revisit my dupe detection!

agross commented 2 months ago

Closing this as no feedback was received.