RhetTbull / textinator

Simple MacOS StatusBar / Menu Bar app to automatically detect text in screenshots
MIT License
179 stars 8 forks source link

Feature: Screenshot in clipboard #4

Closed bwagner closed 2 years ago

bwagner commented 2 years ago

When you use Ctrl-Cmd-Shift-4 to copy a section of the screen, it is not saved as a screenshot, but is merely kept in the Clipboard. Often I don't need to have a screenshot as a file when capturing it just for the sake of text recognition. It would be useful to have Textinator also react to this constellation and perform text recognition of a picture in the clipboard.

RhetTbull commented 2 years ago

Good suggestion! I'm not sure how to monitor the pasteboard for changes without polling so will have to look into that. But here's a proof of concept that will read the image off the clipboard and detect the text (using code from Textinator). So this is possible to implement if I can figure out a reliable way to detect when an image is on the clipboard.

This article shows how to use a timer to poll the clipboard for changes.

"""Proof of concept to detect text in image on clipboard using Vision framework"""
from typing import List, Optional

import objc
import Quartz
import Vision
from AppKit import NSPasteboard, NSPasteboardTypeTIFF
from Foundation import NSURL, NSData, NSDictionary, NSLog

APP_NAME = "Textinator"

def detect_text(
    img_data: NSData,
    orientation: Optional[int] = None,
    languages: Optional[List[str]] = None,
) -> List:
    """process image at img_path with VNRecognizeTextRequest and return list of results

    This code originally developed for https://github.com/RhetTbull/osxphotos

    Args:
        img_path: path to the image file
        orientation: optional EXIF orientation (if known, passing orientation may improve quality of results)
        languages: optional languages to use for text detection as list of ISO language code strings; default is ["en-US"]
    """
    with objc.autorelease_pool():
        input_image = Quartz.CIImage.imageWithData_(img_data)
        # create a CIIImage from the image at img_path as that's what Vision wants
        vision_options = NSDictionary.dictionaryWithDictionary_({})
        if orientation is None:
            vision_handler = (
                Vision.VNImageRequestHandler.alloc().initWithCIImage_options_(
                    input_image, vision_options
                )
            )
        elif 1 <= orientation <= 8:
            vision_handler = Vision.VNImageRequestHandler.alloc().initWithCIImage_orientation_options_(
                input_image, orientation, vision_options
            )
        else:
            raise ValueError("orientation must be between 1 and 8")
        results = []
        handler = make_request_handler(results)
        vision_request = (
            Vision.VNRecognizeTextRequest.alloc().initWithCompletionHandler_(handler)
        )
        languages = languages or ["en-US"]
        vision_request.setRecognitionLanguages_(languages)
        vision_request.setUsesLanguageCorrection_(True)
        success, error = vision_handler.performRequests_error_([vision_request], None)
        if not success:
            raise ValueError(f"Vision request failed: {error}")

        for result in results:
            result[0] = str(result[0])

        return results

def make_request_handler(results):
    """results: list to store results"""
    if not isinstance(results, list):
        raise ValueError("results must be a list")

    def handler(request, error):
        if error:
            NSLog(f"{APP_NAME} Error! {error}")
        else:
            observations = request.results()
            for text_observation in observations:
                recognized_text = text_observation.topCandidates_(1)[0]
                results.append([recognized_text.string(), recognized_text.confidence()])

    return handler

if __name__ == "__main__":
    pb = NSPasteboard.generalPasteboard()
    if img_data := pb.dataForType_(NSPasteboardTypeTIFF):
        results = detect_text(img_data)
        print(results)
    else:
        print("No image on clipboard")
RhetTbull commented 2 years ago

Ensure that pause status is checked when processing clipboard

RhetTbull commented 2 years ago

Should this feature be always on (unless paused) or a separate "Detect text in clipboard images" option?

bwagner commented 2 years ago

Ensure that pause status is checked when processing clipboard

This could be assured with an object-oriented design pattern, i.e. there's an abstract definition of the operation recognize_text_in_image, there are two implementations so far (recognize_text_in_screenshot, recognize_text_in_clipboard). The method calling the operation recognize_text_in_image makes sure that the pause status is checked, no matter which implementation (recognize_text_in_screenshot, recognize_text_in_clipboard) is instantiated at a particular moment. When the text is going to be recognized in audio (🤯) it will be another implementation which will profit from this architecture.

bwagner commented 2 years ago

Should this feature be always on (unless paused) or a separate "Detect text in clipboard images" option?

I could imagine both. But possibly start out without a separate option and see how it goes. Experience will show whether an additional option is worth the trouble (more clutter, more cognitive load).

RhetTbull commented 2 years ago

@bwagner I think I have this feature working well. I've created a beta version that can be downloaded here. If you have time I would appreciate you testing this to see if it works as expected.

bwagner commented 2 years ago

Looks great, @RhetTbull ! Thanks a lot for implementing it! Notifications were not working, but that might be a faulty configuration on my side.

Is it implemented using polling? How much of a performance impact is polling (if it can be estimated roughly)?

RhetTbull commented 2 years ago

Yes, the clipboard is checked via polling. Currently, I've got it set to poll the clipboard for changes every 2 seconds. On my old MacBook, Textinator is using 0% CPU so it doesn't appear to be a big impact. The NSPasteboard API which provides the clipboard (pasteboard) interface provides a means to check if another app has written to the clipboard and that's what Textinator checks thus Textinator does not have to constantly check the clipboard contents. The contents will only be checked for presence of an image if there's an indication of change.

bwagner commented 2 years ago

Terrific. Thanks a lot!