RhetTbull / osxphotos

Python app to work with pictures and associated metadata from Apple Photos on macOS. Also includes a package to provide programmatic access to the Photos library, pictures, and metadata.
MIT License
2.17k stars 100 forks source link

osxphotos sync may get confused with two pics in the Library with different UUIDs but same fingerprint (creating by duplicating pics in Photos) #1641

Closed oPromessa closed 3 months ago

oPromessa commented 3 months ago

Before submitting a bug report, please ensure you are running the most recent version of osxphotos and that the bug is reproducible on the latest version

Yes. In this case using a development branch of 0.68.4. But the official 0.68.4 should have the same behaviour.

Describe the bug Sync impacts any pics with same fingerprint, even though they may have different UUIDs. The pics may have been generated via Duplicate and then Edited in Mac Photos.

To Reproduce Steps to reproduce the behavior:

  1. Use tests/Test-10.15.7.photoslibrary library.
  2. Select picture "Fritest.jpg" with '_uuid': 'A8266C97-9BAF-4AF4-99F3-0013832869B8'
  3. Add it to a New album, say "SyncIssue"
  4. Run osxphotos sync --export test.db
  5. Remove the album "Syncissue"
  6. Run osxphotos sync --import test.db --set title,description,favorite,albums,location --merge keywords --verbose --verbose --timestamp --report import.sync.json
  7. Go into Photos and you'll see in album "Syncissue" two pics, instead of one. Although with different UUIDs, they have the same fingerprint:
    • '_uuid': 'A8266C97-9BAF-4AF4-99F3-0013832869B8'
    • '_uuid': 'D1D4040D-D141-44E8-93EA-E403D9F63E07',
    • they both have the same 'masterFingerprint': 'AUxIqfurFEEy1m1SphGJRmxID+1g'

Expected behavior Don't know what else osxphotos could do, or if it even should act any differently as across Libraries and beyond the Fingerprint there isn't additional information osxphotos could work with to math/differentiate pics in this situation.

Screenshots N/A

Desktop (please complete the following information):

$ sw_vers 
ProductName:        macOS
ProductVersion:     14.6.1
BuildVersion:       23G93
$ osxphotos --version
osxphotos, version 0.68.4
Python 3.11.4 (main, Jul  5 2023, 09:00:44) [Clang 14.0.6 ]
macOS 10.16.0, x86_64

Additional context Building into sync the location field.

RhetTbull commented 3 months ago

This is by design. UUIDs only apply to a specific library on a specific Mac. (And if you use Photos library repair tool, UUIDs may change). Because of this, UUIDs cannot be used to compare assets between libraries. osxphotos sync uses a signature (see photo_signature.py) which in most cases, is the case-normalized filename + the fingerprint (a type of file hash used internally by Photos) to match photos. This is fairly reliable but in the case of true duplicates, will result in multiple matches.

In order to maximize the preservation of metadata during sync, if OSXPhotos finds duplicates during export, the metadata is merged:

https://github.com/RhetTbull/osxphotos/blob/9a07c29e50dd45c178eba5d890314729fa048997/osxphotos/cli/sync.py#L209-L219

During import, any photo matching a signature of a photo in the metadata db gets the metadata applied:

https://github.com/RhetTbull/osxphotos/blob/9a07c29e50dd45c178eba5d890314729fa048997/osxphotos/cli/sync.py#L303-L313

Given two identical photos with the same name, there is no possible way OSXPhotos could know what the user's intent was when doing a sync so the code maximizes preservation of data.

osxphotos import faces the same issue when looking for duplicates. In this case, #1374 added --signature option which allows the user to specify a custom signature template to be used for duplicate compare. The primary reason for this is the photo_signature() method used in sync compares filenames as part of the signature but when importing there may be duplicates with different filenames and the user would want to find those. Thus you can specify just the fingerprint or some other combination of metadata. I could add this same option to osxphotos sync but it would not resolve this issue and I'm not sure it's as useful in the context of sync because the user likely thinks of a matching photo for sync purposes as one that has the same name and is binary equivalent, not just "any possible duplicate".