Closed bwagner closed 2 years ago
Good suggestion! I'm not sure how to monitor the pasteboard for changes without polling so will have to look into that. But here's a proof of concept that will read the image off the clipboard and detect the text (using code from Textinator). So this is possible to implement if I can figure out a reliable way to detect when an image is on the clipboard.
This article shows how to use a timer to poll the clipboard for changes.
"""Proof of concept to detect text in image on clipboard using Vision framework"""
from typing import List, Optional
import objc
import Quartz
import Vision
from AppKit import NSPasteboard, NSPasteboardTypeTIFF
from Foundation import NSURL, NSData, NSDictionary, NSLog
APP_NAME = "Textinator"
def detect_text(
img_data: NSData,
orientation: Optional[int] = None,
languages: Optional[List[str]] = None,
) -> List:
"""process image at img_path with VNRecognizeTextRequest and return list of results
This code originally developed for https://github.com/RhetTbull/osxphotos
Args:
img_path: path to the image file
orientation: optional EXIF orientation (if known, passing orientation may improve quality of results)
languages: optional languages to use for text detection as list of ISO language code strings; default is ["en-US"]
"""
with objc.autorelease_pool():
input_image = Quartz.CIImage.imageWithData_(img_data)
# create a CIIImage from the image at img_path as that's what Vision wants
vision_options = NSDictionary.dictionaryWithDictionary_({})
if orientation is None:
vision_handler = (
Vision.VNImageRequestHandler.alloc().initWithCIImage_options_(
input_image, vision_options
)
)
elif 1 <= orientation <= 8:
vision_handler = Vision.VNImageRequestHandler.alloc().initWithCIImage_orientation_options_(
input_image, orientation, vision_options
)
else:
raise ValueError("orientation must be between 1 and 8")
results = []
handler = make_request_handler(results)
vision_request = (
Vision.VNRecognizeTextRequest.alloc().initWithCompletionHandler_(handler)
)
languages = languages or ["en-US"]
vision_request.setRecognitionLanguages_(languages)
vision_request.setUsesLanguageCorrection_(True)
success, error = vision_handler.performRequests_error_([vision_request], None)
if not success:
raise ValueError(f"Vision request failed: {error}")
for result in results:
result[0] = str(result[0])
return results
def make_request_handler(results):
"""results: list to store results"""
if not isinstance(results, list):
raise ValueError("results must be a list")
def handler(request, error):
if error:
NSLog(f"{APP_NAME} Error! {error}")
else:
observations = request.results()
for text_observation in observations:
recognized_text = text_observation.topCandidates_(1)[0]
results.append([recognized_text.string(), recognized_text.confidence()])
return handler
if __name__ == "__main__":
pb = NSPasteboard.generalPasteboard()
if img_data := pb.dataForType_(NSPasteboardTypeTIFF):
results = detect_text(img_data)
print(results)
else:
print("No image on clipboard")
Ensure that pause status is checked when processing clipboard
Should this feature be always on (unless paused) or a separate "Detect text in clipboard images" option?
Ensure that pause status is checked when processing clipboard
This could be assured with an object-oriented design pattern, i.e. there's an abstract definition of the operation recognize_text_in_image
, there are two implementations so far (recognize_text_in_screenshot
, recognize_text_in_clipboard
). The method calling the operation recognize_text_in_image
makes sure that the pause status is checked, no matter which implementation (recognize_text_in_screenshot
, recognize_text_in_clipboard
) is instantiated at a particular moment. When the text is going to be recognized in audio (🤯) it will be another implementation which will profit from this architecture.
Should this feature be always on (unless paused) or a separate "Detect text in clipboard images" option?
I could imagine both. But possibly start out without a separate option and see how it goes. Experience will show whether an additional option is worth the trouble (more clutter, more cognitive load).
@bwagner I think I have this feature working well. I've created a beta version that can be downloaded here. If you have time I would appreciate you testing this to see if it works as expected.
Looks great, @RhetTbull ! Thanks a lot for implementing it! Notifications were not working, but that might be a faulty configuration on my side.
Is it implemented using polling? How much of a performance impact is polling (if it can be estimated roughly)?
Yes, the clipboard is checked via polling. Currently, I've got it set to poll the clipboard for changes every 2 seconds. On my old MacBook, Textinator is using 0% CPU so it doesn't appear to be a big impact. The NSPasteboard
API which provides the clipboard (pasteboard) interface provides a means to check if another app has written to the clipboard and that's what Textinator checks thus Textinator does not have to constantly check the clipboard contents. The contents will only be checked for presence of an image if there's an indication of change.
Terrific. Thanks a lot!
When you use Ctrl-Cmd-Shift-4 to copy a section of the screen, it is not saved as a screenshot, but is merely kept in the Clipboard. Often I don't need to have a screenshot as a file when capturing it just for the sake of text recognition. It would be useful to have Textinator also react to this constellation and perform text recognition of a picture in the clipboard.