RhetTbull / osxphotos

Python app to work with pictures and associated metadata from Apple Photos on macOS. Also includes a package to provide programmatic access to the Photos library, pictures, and metadata.
MIT License
2.17k stars 100 forks source link

Filename collisions are not properly handled with non-ASCII characters in filename #515

Closed dssinger closed 3 years ago

dssinger commented 3 years ago

I'm exporting photos and setting the filename to the photo title. I don't use the --overwrite option, so if there are two (or more) photos with the same title, I expect the filename to be suffixed with (1) and so forth. This works if the title is all ASCII, but fails if the title includes (at least) the letter í - here are the error messages:

2021-09-12 17:19:07,679 - WARNING - photosdb.py - 100 - WARNING: This module has only been tested with macOS versions [10.12, 10.13, 10.14, 10.15, 10.16, 11.0, 11.1, 11.2, 11.3, 11.4]: you have Darwin, OS version: 11.5
Exporting 2 photos to /Users/david/Desktop/To_Forever...
  [##################------------------]   50%Error exporting photo (7ACA0BF2-2505-43E9-9250-284DBF5774D7: IMG_2332.jpeg) as Frítest.jpg: Error Domain=NSCocoaErrorDomain Code=516 "“7ACA0BF2-2505-43E9-9250-284DBF5774D7.jpeg” couldn’t be copied to “To_Forever” because an item with the same name already exists." UserInfo={NSSourceFilePathErrorKey=/Users/david/Desktop/test.photoslibrary/originals/7/7ACA0BF2-2505-43E9-9250-284DBF5774D7.jpeg, NSUserStringVariant=(
    Copy
), NSDestinationFilePath=/Users/david/Desktop/To_Forever/Frítest.jpg, NSFilePath=/Users/david/Desktop/test.photoslibrary/originals/7/7ACA0BF2-2505-43E9-9250-284DBF5774D7.jpeg, NSUnderlyingError=0x7f833c8b0630 {Error Domain=NSPOSIXErrorDomain Code=17 "File exists"}}
  [####################################]  100%
Processed: 2 photos, exported: 1, missing: 0, error: 1, touched date: 1
Elapsed time: 0.560 seconds

testlib.zip

I'm attaching a zip file with a small test library and the script I was using when I found the problem.

Thanks as always!

RhetTbull commented 3 years ago

Thanks for the sample library & test script -- very helpful! I've figured out where the problem is but can't figure out why this is occurring. It's likely a unicode translation issue but I'll keep searching.

RhetTbull commented 3 years ago

@dssinger I've figured out where the problem is but I'm not sure yet how to fix it. This issue does appear to be a unicode translation issue.

In your example, you're using the photo's title as part of the filename. The title of the sample images is Frítest. Internally, this is represented in Photos with the following unicode characters: Frítest: [70, 114, 237, 116, 101, 115, 116]

When exported as 'Frítest.jpg', the unicode characters from the file system are:

Frítest.jpg: [70, 114, 105, 769, 116, 101, 115, 116, 46, 106, 112, 103]
                      ^^^^^^^^^                     ^.jpg

You'll notice that the third character in the title is unicode 237 (Latin small letter i with acute) but in the filename, it's replaced with a digraph of unicode 105 (Latin small letter i) and 769 (combining acute accent).

The check for file collisions compares that a file with the same name doesn't already exist. In the case the names are different (though they are represented on the screen as identical) because they are comprised of different characters. I need to figure out where this is happening. I don't think the filesystem is doing it, for example:

~/Downloads/test
[I] ➜ touch Frítest

~/Downloads/test
[I] ➜ python
Python 3.9.5 (v3.9.5:0a7dcbdb13, May  3 2021, 13:17:02)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> path = pathlib.Path(".")
>>> files = path.glob("*")
>>> for f in files:
...     print(f"file={f}, {[ord(c) for c in f.name]}")
...
file=Frítest, [70, 114, 237, 116, 101, 115, 116]
RhetTbull commented 3 years ago

osxphotos has a function called normalize_unicode that is run on all strings to fix some previous issues with different unicode representations...I thought this might be the culprit but alas, it appears it is not:

[I] ➜ python
Python 3.9.5 (v3.9.5:0a7dcbdb13, May  3 2021, 13:17:02)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from osxphotos.utils import normalize_unicode
>>> test = "Frítest"
>>> [ord(c) for c in test]
[70, 114, 237, 116, 101, 115, 116]
>>> test = normalize_unicode(test)
>>> [ord(c) for c in test]
[70, 114, 237, 116, 101, 115, 116]
>>>
RhetTbull commented 3 years ago

I think the translation is occurring in call to CopyItemAtPath which I use to take advantage of copy-on-write (and thus greatly enhanced export speed) on APFS file systems. I don't know if the OS is doing this or it's happening somewhere in the python to Objective-C bridge (pyobjc).

RhetTbull commented 3 years ago

Osxphotos uses NFC composed Unicode internally. I may be able to force the file copy to do the same by passing it a composed string. See: https://developer.apple.com/documentation/foundation/nsstring/1412645-precomposedstringwithcanonicalma?language=objc

RhetTbull commented 3 years ago

See this: https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/

and this: https://stackoverflow.com/questions/18137554/how-to-convert-path-to-mac-os-x-path-the-almost-nfd-normal-form

I think the solution will be to normalize all strings passed to CopyItemAtPath and also normalize all strings in findfiles (which does the comparison to see if a file of a certain name already exists)

RhetTbull commented 3 years ago

@dssinger I think I've fixed this! I just need to add tests then I'll push a new release. Do you mind if I include your test library attached to this issue as part of the test suite?

dssinger commented 3 years ago

Not at all - please use the test library; that’s why I sent it!

David Singer

On Sep 13, 2021 at 9:08:05 PM, Rhet Turnbull @.***> wrote:

@dssinger https://github.com/dssinger I think I've fixed this! I just need to add tests then I'll push a new release. Do you mind if I include your test library attached to this issue as part of the test suite?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RhetTbull/osxphotos/issues/515#issuecomment-918782108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN32L4L3A3HH7GRPZDNICDUB3DCLANCNFSM5D4VCSRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

RhetTbull commented 3 years ago

@dssinger I believe this issue is fixed in v0.42.82. Let me know if you still have problems. Cheers!

./tryit
Exporting 2 photos to /Users/rhet/Desktop/To_Forever...
  [####################################]  100%
Processed: 2 photos, exported: 2, missing: 0, error: 0, touched date: 2
Elapsed time: 0.997 seconds
dssinger commented 3 years ago

It appears to be fixed for me in v0.48.82. Thanks!!

David