bmachek / lrc-immich-plugin

Upload images from Lightroom Classic to Immich.
41 stars 3 forks source link

Dealing with duplicates? #13

Closed anaxmedia closed 3 weeks ago

anaxmedia commented 1 month ago

How are duplicates handled? It seems like if I export the same photos they are uploaded to Immich creating duplicates each time. Looking at the Immich API the uploadAsset function should return "created", "replaced", or "duplicate". However the same photo only returns "created" in the response.

bmachek commented 1 month ago

Yes, you're right. I suppose the duplicate check algorithm at this point in the Immich server relies on checking the metadata, rather then the content of the uploaded file. Because any hash algorithm would be to slow at this point. For now the plugin sets 'fileCreatedAt' and 'fileModifiedAt' to the current date and time, maybe this causes the issue. The 'fileCreatedAt' should probably set to the date and time of the shot, and 'fileModifiedAt' to the date and time of the last edit.

I will give this a try, and do another release in the near future. :-)

bmachek commented 1 month ago

Ok, I just read the Immich API documentation again. It seems the duplicate check is indeed based on SHA1. Lua doesn't provide a native SHA1 implementation. I found one from Jeffrey Friedl, but he states it's kinda slow due to limitations of Lua. I will give this a try, and report here, if computing a SHA1 during the export is fast enough.

anaxmedia commented 1 month ago

I came across the same. I've been working on my own publisher for Immich that adds photos and uses the parent folder as the album name (if it exists, otherwise creates the album). I'm currently using the local catalog identifier to determine if a photo already exists in Immich or not (using the searchMetadata function). If it exists I'm updating it using the replaceAssets function, otherwise its added via uploadAsset. This works great so far but unfortunately LrHTTP doesn't support PUT requests for multi-part content so I'm resorting to a curl call through LrTasks. Performance could be better on bulk uploads I'm sure but is sufficient for my use case.

bmachek commented 1 month ago

Hm ok. I was just thinking of computing the hash via system call, since Mac OS and Windows both provide preinstalled utilities for that. Are on Windows or Mac?

anaxmedia commented 1 month ago

I use Lightroom on both Windows and Mac. Wouldn't the hash change with edits to the original thus creating a duplicate? That's why I'm using the replaceAssets function since I don't want the Immich id to change in case the photo was favorited or bookmarked.

bmachek commented 1 month ago

Yes, of course. I need to think about it a bit more. The question is should a new edit be a new asset or not... Kinda philosophical question. I was asking for your OS, because I could use someone, who does a few tests on Windows before the new release. (If I will rely on a system call) Did you publish your plugin's source code as well?

anaxmedia commented 1 month ago

Yeah it is definitely a philosophical question, different people may have different perspectives on the answer. For me if I am producing multiple edits of the same photo I create those are virtual copies which have their own unique local identifier so they can still be published to Immich without overwriting the original/master, to me this makes the most sense for a large portion of users. I can help you test on Windows if you'd like. I haven't published my code yet still cleaning it up.

bmachek commented 1 month ago

Ok, for documentary reasons: Using the SHA1 mechanism isn't possible, as the hash differs on each export. Probably due to metadata differences (export date or so) in the exported JPEG. Will now give your approach with local catalog identifier a try.

bmachek commented 1 month ago

Which Immich field are you using for the local catalog identifier? deviceAssetId?

anaxmedia commented 1 month ago

Yeah, I use the deviceAssetId:

local alreadyExisting = immich:assetExists(identifiers)

    for key in pairs(alreadyExisting) do
        local response = immich:searchMetadata({ deviceAssetId = key, deviceId = immich.deviceId, isTrashed = false })

        if response.assets.count > 0 then
            log:info('Duplicate photo identifier found in Immich, will be updated')
            alreadyExisting[key] = response.assets.items[1].id
        end
    end

The only thing I haven't accounted for is if more than 1 dupe match is returned by searchMetadata(). I only update the first instance. I haven't come across this scenario yet but it is technically possible since deviceAssetIds uniqueness is not enforced on the Immich side (that I know of).

bmachek commented 1 month ago

Ok, just commited the first steps for duplicate handling very similar to your approach. So the problem with replaceAsset in LrHttp is that one gets 404, because of multipart/form-data?

Btw, do you want to join in, instead of developing your own plugin? Album handling could easily modified to automatically use the parent folder name as well... In addition to the other album handling options that already exist...

stumpigit commented 1 month ago

I also tried to extend your Immich plugin with the ReplaceAsset (I only compared by filename, not optimal). From my point of view, replaceAsset gets a 404 not because of multipart, but because it is a PUT request. And Lightroom, as I have seen, does not support PUT with multipart, only POST. That's why I wrote a C# console tool that uploads the asset via put. It seems to work very well. I wrote down the function as a Gist: https://gist.github.com/stumpigit/9ee62f6afa4821c88ed75e09274c65da

The upload tool is written in C# .Net Core, so it would also run on MacOS. Are you interested in the tool, or is there a more elegant variant? The code is very simple: https://gist.github.com/stumpigit/0bdffaab392c0bdf98a710a774786ddf

anaxmedia commented 1 month ago

Ok, just commited the first steps for duplicate handling very similar to your approach. So the problem with replaceAsset in LrHttp is that one gets 404, because of multipart/form-data?

Btw, do you want to join in, instead of developing your own plugin? Album handling could easily modified to automatically use the parent folder name as well... In addition to the other album handling options that already exist...

I also tried to extend your Immich plugin with the ReplaceAsset (I only compared by filename, not optimal). From my point of view, replaceAsset gets a 404 not because of multipart, but because it is a PUT request. And Lightroom, as I have seen, does not support PUT with multipart, only POST. That's why I wrote a C# console tool that uploads the asset via put. It seems to work very well. I wrote down the function as a Gist: https://gist.github.com/stumpigit/9ee62f6afa4821c88ed75e09274c65da

The upload tool is written in C# .Net Core, so it would also run on MacOS. Are you interested in the tool, or is there a more elegant variant? The code is very simple: https://gist.github.com/stumpigit/0bdffaab392c0bdf98a710a774786ddf

Yes the core issue is lack of support for multipart PUT request handling in LrHTTP. I got around this by executing a CURL request which is native on all OS and can be called directly with LrTasks.execute, no need for anything additional. I’m traveling until Sunday but can provide my implementation when back.

For my needs having a publish service makes the most sense as I can utilize built in Lightroom handling of marking edited photos as modified for republish instead of manually tracking and exporting after changes. That’s my biggest use case and why I went the publish service route. I would be open to working on a single solution for both export and publish though.

stumpigit commented 1 month ago

Yes, Curl is of course a good idea!

Yes, I would very much welcome a single solution between Export and Publish. My Immich workflow also goes through Lightroom. Whereby my direct cell phone upload first goes into Immich, then with a file system sync into Lightroom, where I can edit everything. The edited images are then exported back to Immich.

One more question: Is it important that the DeviceID remains the same in Immich? Currently, my Immich Android app uploads the original image again after one day and I have two versions in Immich.

Have a good trip!

bmachek commented 1 month ago

The only alternative to curl or .NET that I see, is probably: lua-http (https://daurnimator.github.io/lua-http/0.4/). But I have no experience with that library, so I can't tell by now if it's capable of doing multipart PUT requests. But this way we would keep everything in LUA, and would avoid to ship binaries. Concerning performance on big exports we should compare the different solutions.

As for the publishing part, there already is a project: https://github.com/midzelis/mi.Immich.Publisher I don't know what your plans are, do you want the plugin to do both publishing and export, or do you want to keep it separate. I would very much appreciate it, if we do a single plugin for both needs and join hands on it.

bmachek commented 1 month ago

OK, I just reckoned lua-http is not available on Windows.

stumpigit commented 1 month ago

I think curl is the right way to go. A simple call, like in my Gist works:

curl --location --request PUT 'https://xxx.yyy/api/assets/{uid}/original' \
--header 'x-api-key: xxxxx' \
--form 'assetData=@"/C:/path/to/"' \
--form 'deviceAssetId="{assetid}"' \
--form 'deviceId="Lightroom Immich Plugin"' \
--form 'fileCreatedAt="2024-08-17"' \
--form 'fileModifiedAt="2024-08-17"

It is important that the deviceAssetId is copied from the original, otherwise the app will upload the file again later.

bmachek commented 1 month ago

I just implemented a self-written Multipart PUT request method, which uses LrHttp.post and builds the multipart request body itself. And it seems to work, I just replaced some images at my host. Check out the latest ImmichAPI.lua...

stumpigit commented 1 month ago

Many thanks and congratulations on the solution!

Unfortunately, it is not yet good for my workflow. In principle, I differentiate between two types in my workflow:

This way I have always synchronized the cell phone images immediately. I have seen that you use the "localIdentifier", i.e. the number from the Lightroom catalog, to test whether the image already exists? In my case, the images already exist in Immich and I would need the existing deviceAssetId. That's why I did this by comparing file names. But that's not a very clever solution. But what else could you do? And would it be possible to set in the dialog which criteria should be used to search whether the file is already online?

bmachek commented 4 weeks ago

If you take the cell phone images just from the file system. Probably the only thing you can rely on is the SHA1 checksum, because any other metadata is not accessible from the filesystem. In searchMetadata you can use the checksum parameter. But then again, this isn't very elegant and could get tricky, once you edit an image multiple times and resync the edited file to Lightroom.

I never used a publish service in Lightroom. Would it be possible to download the cell phone images from Immich, edit them, and republish if we'd implement a Publish service for Immich? (I mean hypothetically, is this the way publish services work) If we do it this way, there are many more possibilities like storing the Immich asset Id or deviceAssetId somewhere in the Lightroom catalog.

stumpigit commented 4 weeks ago

I have a suggestion for an additional comparison with Immich as PR comitted (#14). This would be very practical for me and I think it would generally help to prevent duplicates. What do you think? I can also accept it if this is implemented later in a publication service, but it would be helpful for the current version.

bmachek commented 4 weeks ago

Just merged the pull request. :-)

bmachek commented 4 weeks ago

@anaxmedia Does this match your needs? I would then close the issue. As for the publish service, who's in to implement this? ;-)

anaxmedia commented 4 weeks ago

Looks great, we can close this issue. I'd be happy to help package everything into one solution. I used https://github.com/midzelis/mi.Immich.Publisher as a base for my plugin but re-wrote a lot of it to handle things like dupes differently. I like your API class it's cleaner and not having to call CURL externally should help performance a lot. How do you want to collaborate?

bmachek commented 3 weeks ago

Will close the issue now and open a discussion about the publishing part and collaboration.