Closed dssinger closed 3 years ago
Thanks for the sample library & test script -- very helpful! I've figured out where the problem is but can't figure out why this is occurring. It's likely a unicode translation issue but I'll keep searching.
@dssinger I've figured out where the problem is but I'm not sure yet how to fix it. This issue does appear to be a unicode translation issue.
In your example, you're using the photo's title as part of the filename. The title of the sample images is Frítest
. Internally, this is represented in Photos with the following unicode characters:
Frítest: [70, 114, 237, 116, 101, 115, 116]
When exported as 'Frítest.jpg', the unicode characters from the file system are:
Frítest.jpg: [70, 114, 105, 769, 116, 101, 115, 116, 46, 106, 112, 103]
^^^^^^^^^ ^.jpg
You'll notice that the third character in the title is unicode 237 (Latin small letter i with acute) but in the filename, it's replaced with a digraph of unicode 105 (Latin small letter i) and 769 (combining acute accent).
The check for file collisions compares that a file with the same name doesn't already exist. In the case the names are different (though they are represented on the screen as identical) because they are comprised of different characters. I need to figure out where this is happening. I don't think the filesystem is doing it, for example:
~/Downloads/test
[I] ➜ touch Frítest
~/Downloads/test
[I] ➜ python
Python 3.9.5 (v3.9.5:0a7dcbdb13, May 3 2021, 13:17:02)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> path = pathlib.Path(".")
>>> files = path.glob("*")
>>> for f in files:
... print(f"file={f}, {[ord(c) for c in f.name]}")
...
file=Frítest, [70, 114, 237, 116, 101, 115, 116]
osxphotos has a function called normalize_unicode
that is run on all strings to fix some previous issues with different unicode representations...I thought this might be the culprit but alas, it appears it is not:
[I] ➜ python
Python 3.9.5 (v3.9.5:0a7dcbdb13, May 3 2021, 13:17:02)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from osxphotos.utils import normalize_unicode
>>> test = "Frítest"
>>> [ord(c) for c in test]
[70, 114, 237, 116, 101, 115, 116]
>>> test = normalize_unicode(test)
>>> [ord(c) for c in test]
[70, 114, 237, 116, 101, 115, 116]
>>>
I think the translation is occurring in call to CopyItemAtPath which I use to take advantage of copy-on-write (and thus greatly enhanced export speed) on APFS file systems. I don't know if the OS is doing this or it's happening somewhere in the python to Objective-C bridge (pyobjc).
Osxphotos uses NFC composed Unicode internally. I may be able to force the file copy to do the same by passing it a composed string. See: https://developer.apple.com/documentation/foundation/nsstring/1412645-precomposedstringwithcanonicalma?language=objc
See this: https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/
I think the solution will be to normalize all strings passed to CopyItemAtPath and also normalize all strings in findfiles
(which does the comparison to see if a file of a certain name already exists)
@dssinger I think I've fixed this! I just need to add tests then I'll push a new release. Do you mind if I include your test library attached to this issue as part of the test suite?
David Singer
On Sep 13, 2021 at 9:08:05 PM, Rhet Turnbull @.***> wrote:
@dssinger https://github.com/dssinger I think I've fixed this! I just need to add tests then I'll push a new release. Do you mind if I include your test library attached to this issue as part of the test suite?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RhetTbull/osxphotos/issues/515#issuecomment-918782108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN32L4L3A3HH7GRPZDNICDUB3DCLANCNFSM5D4VCSRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@dssinger I believe this issue is fixed in v0.42.82. Let me know if you still have problems. Cheers!
./tryit
Exporting 2 photos to /Users/rhet/Desktop/To_Forever...
[####################################] 100%
Processed: 2 photos, exported: 2, missing: 0, error: 0, touched date: 2
Elapsed time: 0.997 seconds
It appears to be fixed for me in v0.48.82. Thanks!!
David
I'm exporting photos and setting the filename to the photo title. I don't use the
--overwrite
option, so if there are two (or more) photos with the same title, I expect the filename to be suffixed with(1)
and so forth. This works if the title is all ASCII, but fails if the title includes (at least) the letterí
- here are the error messages:testlib.zip
I'm attaching a zip file with a small test library and the script I was using when I found the problem.
Thanks as always!