RhetTbull / osxphotos

Python app to work with pictures and associated metadata from Apple Photos on macOS. Also includes a package to provide programmatic access to the Photos library, pictures, and metadata.
MIT License
2.04k stars 96 forks source link

--use-photokit crashes on Monterey #625

Open RhetTbull opened 2 years ago

RhetTbull commented 2 years ago

Ok, so the behaviour is more inconsistent that I thought. I did try to repair my library before I read your last message, as I had read that could sort issues with regular export of photos.

I tried downloading a different album where my code had resulted in an empty folder. This time some of the photos/videos were downloaded (maybe 10 out of 125). I tried again, and the count went up to ~30.

As for your second suggestion, using --use-photokit fails with:

Error exporting photo (CED06217-BEC1-47CD-A539-3E36017C24FC: IMG_0107.JPG) as IMG_0107.JPG: Could not get authorizaton to use Photos: auth_status = 0

Add: python actually crashes when using --use-photokit. log attached. osxphotos_crash.txt

Originally posted by @moffat in https://github.com/RhetTbull/osxphotos/discussions/624#discussioncomment-2127306

RhetTbull commented 2 years ago

Can't fix this until I get access to a Monterey machine. Not sure if this happens on both Intel and M1 or just M1.

Terminal.app should ask for authorization as it does on other Mac OS versions. Don't know why it didn't.

This may not be fixable on Monterey.

moffat commented 2 years ago

So I use iTerm, not Terminal.app. I switched to Terminal.app to test again and indeed it didn't crash (it just returned errors) and at the same time it asked for permission. Once granted the errors went away (although again, it had started trying to download/export before waiting for the permission to be granted so there were a bunch of errors out by then).

With PhotoKit I'm seeing some new photos being downloaded (although hard to know if this would happened without it anyway) but at least one photo was not exported (it's still running).

RhetTbull commented 2 years ago

@moffat I use iTerm as well and I've noticed that sometimes it fails to ask for authorization but this is not easily reproducible. Unfortunately, testing the PhotoKit authorization is difficult because once it's granted, the OS never asks again and it's not easy to reset the authorization for testing.

(although again, it had started trying to download/export before waiting for the permission to be granted so there were a bunch of errors out by then).

I've at various times tried to put in code to request authorization before downloading but this is unreliable and I've not found a good way to test.

o3bvv commented 1 year ago

For iTerm, it's possible to go to System Preferences > Security & Privacy > Photos, and grant access to iTerm:

image
RhetTbull commented 1 year ago

Thanks. I use iTerm as well for osxphotos development but some versions of iTerm appeared to cause problems even when permission was granted. I've not seen any reports of issues in quite a while so this may be fixed now. I'll update the osxphotos docs.

o3bvv commented 1 year ago

I've just discovered your project recently and the tool was the first thing I tried to access the Photos lib programmatically. There were "permission denied" issues for any operation, not just for the export with --use-photokit flag, i.e., even trying to list albums crashed due to that. I knew that the Photos lib was a directory, but any attempt to list or measure its contents failed using cmd tools. Then I've decided to open the Photos app and run inspect and for a big surprise, it worked somehow. Then, an execution of some other subcommand invoked a confirmation dialog for granting permissions. And thereafter, no auth issues were observed. Hence, it does not seem like the issue is related to the --use-photokit flag.

However, the export really segfaults. I've been trying to get around that for a couple of days and looked through the implementation. From the first look, your code does not seem to have something able to cause segfaults. I would really suspect implementations in pyobjc for fetching photos and for running text detections, especially the latter one. Hence, the only quick way to fix that currently is to run the command inside the shell loop and use the --update flag to rerun the operation if it crashes and hope nothing will be lost.

It would be nice to split the whole export task into batches, run them in subprocesses, and retry if needed. But due to the current architecture, it's not possible, as the job of actual data moving and record keeping is interwoven, and the latter one does not allow usage of multiple processes because of sqlite. It's almost possible to implement as subtasks, as shown in the concurrent export example, but there are certain discrepancies between the API of bare PhotoInfo.export and PhotoExporter.export, which is used by the export command.

Nevertheless, I would like to thank you for all your efforts and dedication, that's really terrific! As for “self-taught hobbyist”, your project and the tool are a pleasure to deal with compared with the majority of projects I've seen people are doing professionally.

RhetTbull commented 1 year ago

Sounds like whichever Terminal app you're using didn't correctly ask for permissions to access Photos. Osxphotos needs permission to script Photos and permission to access the Photos directory. Check the privacy settings for your terminal app and ensure it shows these permissions.

Are you using Monterey? It has been the most problematic of all versions but I do know some users are successfully using osxphotos on Monterey.

Also there is a --limit option that if used with --update will allow you to export in batches. See --limit

o3bvv commented 1 year ago

Yes, I'm using 12.6.5, Python 3.11.3. Just run a 23k export as the following:

until osxphotos export /path/to/media/ \
  --exportdb /path/to/export.db \
  --report /path/to/export.csv \
  --post-function post.py::post_function \
  --export-by-date \
  --skip-original-if-edited \
  --jpeg-ext jpg \
  --download-missing \
  --use-photokit \
  --retry 10 \
  --touch-file \
  --update \
  --verbose \
; do sleep 5; done

It finished in a single 10h run without segfaults or other issues, which is a huge relief:

Processed: 22962 photos, exported: 22923, updated: 0, skipped: 0, updated EXIF data: 0, missing: 60, error: 0, touched date: 22923
Elapsed time: 10:04:30

The post-function can be found at post.py.

This time I've excluded text detection, as images still will be processed by large models. That is the only difference from the previous runs, which segfaulted after several hours, and before they did, they threw 2 types of errors. One is definitely related to text detection:

Error Domain=com.apple.vis Code=13 "CRImage Reader Detector was given zero-dimensioned image (0 x 0)" UserInfo={NSLocalizedDescription=CRImage Reader Detector was given zero-dimensioned image (0 x 0)}

I belive these can be avoided by checking the dimensions before calling text detection by API user.

But there were also errors of a second kind. I cannot remember the exact message, but it was too generic to be useful, something like “access denied” or similar.

Anyway, not using Apple's text detection resulted in a clean run even with --use-photokit flag enabled.

Thank you for giving a hint of the --limit flag, but for batches I was trying to say that it could be possible to implement a multiprocessing export by processing batches in parallel and additionally restart tasks in case they panic. However, this would require implementation of an additional layer of task scheduling and coordination. Not sure, how valuable that can be and how many users can benefit from it, but I'd be glad to take some time towards making it possible, if needed.

RhetTbull commented 1 year ago

@oblalex Glad you got it working. Yes, I'd love to get parallel/concurrent export working. I've got a branch where I've done some work towards this and recently did a fair amount of refactoring of the code to make it easier. However, the current architecture of the code makes it difficult to do this easily. For example, osxphotos reads the entire Photos library into memory and de-normalizes it into some Python data structures. This is the cause of the slow startup before export actually happens. This PhotosDB object cannot be easily shared between processes. I've got a design that would avoid this and would just read directly from the Photos database as needed but don't have time at the moment to implement it. I'd also have to parallelize things like exiftool (by creating a process pool of exiftool runners), etc.

As I use osxphotos primarily for backing my library in a nightly run, I don't care if it takes 30 minutes or 5 hours and this is thus low priority for me. At the moment I'm spending my very limited free time focused on a couple of other projects. When I have time I'll come back to the concurrent branch and keep tinkering with it.

RhetTbull commented 1 year ago

If you have one of the zero dimension images you can share I can do some tests and try to catch the text detection error.

o3bvv commented 1 year ago

Got it, thank you! I will try to share my thoughts and results of zero-dimention investigations tomorrow due to the same issue with limited time 😀

o3bvv commented 1 year ago

@RhetTbull, I have reviewed the branch and if I correctly understand, the core idea of changes are the following:

  1. A thread pool is used to process each photo separately. The list of photos is prepared upfront, as in the standard version.
  2. ExifTool is run in a pool of subprocesses, and the pool acts a sort like a global object. I'm not sure what for the threading lock is used in ExifTool, as its instances are used as context managers, so they are not shared between other threads, and the lock inside of it is used only once.

In general, using the thread pool could be expected to lead to increased performance, however, in practice it can cause the opposite, because of 2 things: 1) coordination of execution of many small tasks can eat performance gains, and 2) Python uses GIL and does not have such threads as would be expected. So, if tasks are not merely IO-bound, results can be worse than in the case of a single thread. It's hard to estimate, if threads can help here, as there are many things beyond just copying are done in the code.

As for sharing the PhotosDB between the processes, if I correctly understand how export works, there can be no need for that. The data in that object is a real treasure, the slow startup caused by it is not that bad, however, there is always a room for improvement. From the code it seems like that object converts the db of the library into a list of PhotoInfo objects. I would consider serialization of that objects into several independent files, and then running export jobs in several processes, feeding each file to each process as a task queue, and avoiding communication between processes as much as possible.

Also, as the export is a disk-heavy operation, there might be no gains from parallelization for certain types of devices, like HDDs, which do not support parallel accesses.

It would be nice to hear back your considerations if possible. And, maybe, it would make sense to move such conversations to a separate ticket, as it got a bit off-topic.

chengguangnan commented 8 months ago

--use-photokit crashes on macOS 14.2 with iTerm2. Changing to Terminal.app resolves the problem.

RhetTbull commented 8 months ago

Yes, iTerm doesn't request the necessary permissions. I have filed an issue with iTerm which has been fixed in the nightly build.