RhetTbull / osxphotos

Python app to work with pictures and associated metadata from Apple Photos on macOS. Also includes a package to provide programmatic access to the Photos library, pictures, and metadata.
MIT License
2.14k stars 99 forks source link

Exporting with --directory "{folder_album}" can't handle accented characters #410

Closed ifarfan closed 2 years ago

ifarfan commented 3 years ago

Hey guys,

While doing an export of all my albums I noticed that an Album named Kraków 2016 was being exported as _

So instead of creating /Volumes/photos/Kraków 2016 it was creating this /Volumes/photos/_

This is the command I issued:

osxphotos export /Volumes/photos \
    --directory "{folder_album}" \
    --exiftool \
    --exiftool-option '-m' \
    --person-keyword \
    --album-keyword \
    --skip-original-if-edited \
    --update \
    --overwrite \
    --current-name \
    --touch-file \
    --retry 2

Once I removed the accented character osxphotos was able to correctly create the album folder.

In addition, if I issue a osxphotos albums, the album Kraków 2016 shows up correctly.

Only issue I can think of is that I'm using a Synology NAS for my externally mounted volume, but I haven't had any issues creating/updating files/folders from my Mac using accented characters (and I use them all over the place).

Amazing tool, btw!

RhetTbull commented 3 years ago

I'll take a look at this. This is weird as the code goes to great lengths to preserve special characters and even has a test for this very case (where the album name has diacritics in it). The code that creates export directories sanitizes for illegal path characters and replaces them automatically. The "_" folder is used as a default when the template doesn't match (in this case "{folderalbum}") but in your example, if the photos are in the album, they should not end up in the "" folder.

Reference these test {folder_album} cases:

https://github.com/RhetTbull/osxphotos/blob/d17454772cebbd6edd5d8e0f04e80feecbdb2355/tests/test_cli.py#L278-L282

ifarfan commented 3 years ago

Thanks!

Also, I didn't mention that said album is also within a few other folders:

Screenshot_4_2_21__1_37_PM

Thus the final path would've been: /Volumes/photos/Travel/Europe/Kraków 2016 but instead all the photos ended up in /Volumes/photos/_

Just in case the filename escaping logic trips under a folder hierarchy

RhetTbull commented 3 years ago

I just did a test using your exact scenario and export command and was not able to replicate this.

I created an album called Kraków 2016 in folder tree Travel/Europe:

Screen Shot 2021-04-03 at 7 42 29 AM

Then ran the following export command:

osxphotos export ~/Desktop/export --directory "{folder_album}" \
    --exiftool \
    --exiftool-option '-m' \
    --person-keyword \
    --album-keyword \
    --skip-original-if-edited \
    --update \
    --overwrite \
    --current-name \
    --touch-file \
    --retry 2

The resulting Kraków 2016 folder was created as expected:

Screen Shot 2021-04-03 at 7 42 13 AM

Please try the following command and post the output here or send it to me at rturnbull+git@gmail.com so I can do some further debugging on your particular scenario.

osxphotos export ~/Desktop/export --album "Kraków 2016" --verbose --directory "{folder_album}" > debug.txt

Replaced the ~/Desktop/export path with a temporary export path of your choosing.

ubrandes commented 3 years ago

Maybe this helps: Unfortunately there are many ways to express accented characters. And as I had to learn, OSX and Synology seem not always 100% compatible, even for simple classics like äöüéè etc. and when UTF-8 (the default) is set on both sides.

I found a Python script "nfcfn.py" here on GitHub to clean up ("normalize") these characters, which I've used to clean my UTF encoding to one single working model on my Synology NAS.

Commands like ls -d * | od -tax1 (display folder contents as raw hex bytes) had initially helped me debug and identify these issues between DSM and OSX (and iOS).

Background is that UTF-8 offers multiple ways of expressing the same unicode characters, and MacOS and Synology seem to differ in details. E.g. the German character "ü" can be encoded OSX-style as a 'u' followed by the two bytes "\xCC" and "\x88". These two bytes together make up the UTF-8 representation of \u0308, the "combining diaeresis" [i.e. two dots above the preceding character, which is called „Kombinierendes Trema“ in German].

So as a result, we have at least two ways to express umlaut ‚ü‘ in UTF-8:

(BTW the most extreme case has been PhotoSync, my favorite app to export photos from iOS, where as default, "favorite" flags can optionally be exported as single character ❤️ in the filename => and if exported via FTP to my Synology — to note, all latest versions, all with default settings — the ❤️ arrives in the filenames as hex "c3 a2 c2 9d c2 a4 c3 af c2 b8 c2 8f" => 12 Bytes for a single unicode character — which DSM then has not been able to interpret correctly).

RhetTbull commented 3 years ago

@ifarfan I've not been able to replicate this bug. If you try the following command and post the output here or send it to me at rturnbull+git@gmail.com so I can do some further debugging on your particular scenario that would be helpful.

osxphotos export ~/Desktop/export --album "Kraków 2016" --verbose --directory "{folder_album}" > debug.txt

Replaced the ~/Desktop/export path with a temporary export path of your choosing.

RhetTbull commented 2 years ago

@ifarfan if you're still using osxphotos are you still having problems with this issue? I've made some changes to Unicode handling in osxphotos recently that might help. Try running with the latest version and let me know if you still have issues.

ifarfan commented 2 years ago

@RhetTbull thanks for the follow up! I manually renamed all the folders with foreign accents and haven't had issues again, it might've been an invalid hidden character and/or a linux-to-mac unicode hiccup during a folder rename/copy

I'm good now 👍

jotzet79 commented 4 months ago

@RhetTbull, I ran into this issue today when exporting via osxphotos and compare the result with the Original folder, I imported to Apple Photos years ago.

Original Import Folder
JotMac:Scripts jotzet$ echo -n "1999-02 - Bundesheer Lilienfeld D-Brückenbau" | od -A n -t x1
           31  39  39  39  2d  30  32  20  2d  20  42  75  6e  64  65  73
           68  65  65  72  20  4c  69  6c  69  65  6e  66  65  6c  64  20
           44  2d  42  72  75  cc  88  63  6b  65  6e  62  61  75        

OSXPhotos Export Folder
JotMac:Scripts jotzet$ echo -n "1999-02 - Bundesheer Lilienfeld D-Brückenbau" | od -A n -t x1
           31  39  39  39  2d  30  32  20  2d  20  42  75  6e  64  65  73
           68  65  65  72  20  4c  69  6c  69  65  6e  66  65  6c  64  20
           44  2d  42  72  c3  bc  63  6b  65  6e  62  61  75            

I also set the Locale Shell variable export LC_ALL="de_AT.UTF-8" and tried to do the report, but I had no luck.

Will look now into nfcfn.py as proposed by @ubrandes.

Thank you, Joachim

BTW, always those guys dealing with the german (or polish in that case) umlauts ;-) #208

oPromessa commented 4 months ago
  1. Are you in the latest osxphotos?
  2. How did you do the import and name the album?
    • via direct import on Photos
    • via osxphotos import?
    • did you name the album in Photos itself ?

Albums names was Unicode 'normalized' on osxphotos import by Rhetbull in one of the latest versions #1475 see also #1085 (with a very complete description on the wonders of "Unicode characters can take one of 4 different normalization forms: NFC, NFD, NKFC, NKFD)."

So I'd guess now:

But somehow not aligned with the original folder name in the file system from the moment it was imported.

RhetTbull commented 4 months ago

@jotzet79 unicode is always tricky to deal with and it's entirely possible there's a bug in OSXPhotos. Here's what OSXPhotos does at the moment:

When comparing text, rendering templates, writing data to Photos (e.g. creating albums), etc., OSXPhotos always converts to NFC formatted unicode. This is consistent with what macOS does.

However, when creating filenames and directories, OSXPhotos will convert to NFD format if on macOS, otherwise NFC if on linux. This is consistent with how the default behavior of the two operating systems.

That means the album name in Photos may be different than the folder name on disk though visually they will be the same. Internally they would use 2 different unicode encodings.

I've considered in the past adding a unicode template that would convert text to a given encoding. For example:

{unicode.nfc:{folder_album}} or {folder_album|unicode.nfc}.

Internally this would take a fair bit of work because the template system normalizes everything. Another option is to specify the "internal" unicode format and the "external (on disk)" unicode format via options. This is much easier to implement as osxphotos already contains methods to globally adjust this in the code. For example:

osxphotos export --directory {folder_album} --unicode-filesystem NFC --unicode-internal NFD

I'll open a new issue for this.

jotzet79 commented 4 months ago

@oPromessa to answer your questions inline...

  1. Are you in the latest osxphotos?

Yes, absolutely

JotMac:Pictures jotzet$ osxphotos --version
osxphotos, version 0.68.1
Python 3.12.3 (main, Apr  9 2024, 16:54:45) [Clang 14.0.0 (clang-1400.0.29.202)]
macOS 13.6.7, x86_64
  1. How did you do the import and name the album?

    • via direct import on Photos

This: Long time ago I did photos management via Files'n'Folders. During Corona all my legacy pics (=non Smartphone) were geotagged, and then I imported all my folders (=albums) via DnD to Apple Photos.

  • via osxphotos import?

Nope, as this didn't exist back then.

  • did you name the album in Photos itself ?

Nope, the naming stems from the initial folders on the filesystem

Albums names was Unicode 'normalized' on osxphotos import by Rhetbull in one of the latest versions #1475 see also #1085 (with a very complete description on the wonders of "Unicode characters can take one of 4 different normalization forms: NFC, NFD, NKFC, NKFD)."

Yep, I also already wrote a folder "translation" script via python in the meanwhile using unicodedata.html

So I'd guess now:

  • exported album names are correct.
  • If you osxphotos import this folder and export it: it should also be correct.

Yet to be verified...

But somehow not aligned with the original folder name in the file system from the moment it was imported.

jotzet79 commented 4 months ago

@jotzet79 unicode is always tricky to deal with and it's entirely possible there's a bug in OSXPhotos. Here's what OSXPhotos does at the moment:

When comparing text, rendering templates, writing data to Photos (e.g. creating albums), etc., OSXPhotos always converts to NFC formatted unicode. This is consistent with what macOS does.

However, when creating filenames and directories, OSXPhotos will convert to NFD format if on macOS, otherwise NFC if on linux. This is consistent with how the default behavior of the two operating systems.

That means the album name in Photos may be different than the folder name on disk though visually they will be the same. Internally they would use 2 different unicode encodings.

I've considered in the past adding a unicode template that would convert text to a given encoding. For example:

{unicode.nfc:{folder_album}} or {folder_album|unicode.nfc}.

Internally this would take a fair bit of work because the template system normalizes everything. Another option is to specify the "internal" unicode format and the "external (on disk)" unicode format via options. This is much easier to implement as osxphotos already contains methods to globally adjust this in the code. For example:

osxphotos export --directory {folder_album} --unicode-filesystem NFC --unicode-internal NFD

I'll open a new issue for this.

@RhetTbull: Thank you for your prompt response!

Oddly when I tested inputting data to create albums and folders anew, it resulted in NFD based unicode representation always (see above). But honestly I might be completely wrong: this char representation things and encodings are really driving me nuts... 😃

But be also aware that I really don't consider this issue as high priority - its an edge case probably, and others don't have this problem anyway.

Again, I really enjoy using osxphotos (especially the inspect feature is really sexy) - Thank you!

Kind regards, Joachim

PS: If you are really "freaky enough" , you might try out macOS's Keyboard Viewer in combination with German (or even Austrian) Keyboard Settings to be able to reproduce this mess. Maybe other locale settings behave differently, who knows? Screenshot 2024-06-13 at 23 32 17