`detected_text` causes `segmentation fault` and crashes python runtime

ces3001 commented 1 year ago

Describe the bug Using detected_text whether as a {detected_text} template substitution in the command-line, or within python code such as dtext = p.detected_text(confidence_threshold=0.5) will cause segmentation fault crashes to the python runtime. No further error information is available. It doesn’t crash in the same place every time.

To Reproduce Steps to reproduce the behavior:

Command line
1. Command line: osxphotos export PhotosTest --export-by-date --skip-original-if-edited --skip-live --skip-bursts --skip-raw --jpeg-ext jpg --exiftool --keyword-template "{label?label:{label},}" --keyword-template "{detected_text?text:{detected_text},}" --keyword-template "{keyword?keyword:{keyword},}" --update --report export.csv --convert-to-jpeg --jpeg-quality "0" --preview --preview-suffix ".preview" --touch-file --cleanup --verbose --from-date "2023-05-06 00:00" --to-date "2023-05-12 23:59”
2. What was the error output?
```
Exporting 347 photos ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  21% 0:00:38zsh: segmentation fault  osxphotos export PhotosTest --export-by-date --skip-original-if-edited 
```
  (command line cropped in terminal output)
Within python
1. statement dtext = p.detected_text(confidence_threshold=0.5)
2. Mostly segmentation fault. At one point I got a malloc error but didn’t copy it.

Expected behavior Get the detected text without crashing. This works most of the time, but will crash at some point every time when doing this over hundreds or thousands of photos.

Desktop (please complete the following information):

OS: Ventura 13.4

osxphotos version (osxphotos --version)

osxphotos, version 0.60.1
Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec  7 2020, 12:10:52)
[Clang 6.0 (clang-600.0.57)]
macOS 10.16.0, x86_64

RhetTbull commented 1 year ago

This sounds like it could be a memory leak in the detected_text code which is actually written in Objective-C and called from python. Will take a look when I get a chance.

RhetTbull commented 1 year ago

@ces3001 I'll look at this bug but if you're running on Ventura, you can skip using osxphoto's detected text feature and just access the text that Photos has already detected. Results won't be 100% identical as Photos' uses a higher confidence level but it should be close. Change your keyword template to:

--keyword-template "{photo.search_info.detected_text?text:{photo.search_info.detected_text},}"

This accesses the search_info property of a photo to directly get the detected text from the database. It will be much faster too as the text detection won't need to take place at export time.

RhetTbull commented 1 year ago

@all-contributors please add @ces3001 for bug

allcontributors[bot] commented 1 year ago

@RhetTbull

I've put up a pull request to add @ces3001! :tada:

ces3001 commented 1 year ago

@ces3001 I'll look at this bug but if you're running on Ventura, you can skip using osxphoto's detected text feature and just access the text that Photos has already detected. Results won't be 100% identical as Photos' uses a higher confidence level but it should be close. Change your keyword template to:

--keyword-template "{photo.search_info.detected_text?text:{photo.search_info.detected_text},}"

This accesses the search_info property of a photo to directly get the detected text from the database. It will be much faster too as the text detection won't need to take place at export time.

Thanks, this is essentially what I ended up doing in python with your library. I moved from embedding all the metadata in the export files, to exporting images with only necessary metadata and accessing Photos.app detected_text and other metadata from python. Thank you!

RhetTbull / osxphotos

`detected_text` causes `segmentation fault` and crashes python runtime #1081