JoeyEamigh / react-native-text-recognition

MIT License
23 stars 14 forks source link

Automatically detect language by default in text-recognition. New params. #12

Open spetrey opened 1 year ago

spetrey commented 1 year ago

This PR was kind of born out of #11:

if we can set automaticallyDetectsLanguage, recognitionLanguages, customWords, usesLanguageCorrection, and recognitionLevel in JS, it would be nice.

This PR adds the capability to autoDetectLanguage by default. One can manually set recognitionLanguages. It also expands some types for typescript sanity.

I will admit that Swift is not my primary lang so I would love a second pair of eyes to proof-read my work :) Shoutout to @ElicaInc who wrote most of the boilerplate Swift. I added a bit more to support adding a languageCode string to identify the result output.

Totally open to any and all feedback. Would love to hear your thoughts @JoeyEamigh as well πŸ˜„

ElicaInc commented 1 year ago

good work!

ElicaInc commented 1 year ago

I'm new to React Native and Swift. Mind reviewing the code? 1) added: @available(iOS 13.0, *) //VNRecognizeTextRequest is a class introduced in the Vision framework in iOS 13.0 2) added: @objc static func requiresMainQueueSetup() -> Bool { return true } // true, if the module needs to be initialized on the main thread 3) added: text position and size props 4) updated: the condition settings for iOS 16 and above.

-- TextRecognition.swift --

import NaturalLanguage
import Vision

extension String {
  func stripPrefix(_ prefix: String) -> String {
    guard hasPrefix(prefix) else { return self }
    return String(dropFirst(prefix.count))
  }
}

@available(iOS 13.0, *) //VNRecognizeTextRequest is a class introduced in the Vision framework in iOS 13.0
@objc(TextRecognition)
class TextRecognition: NSObject {
    @objc static func requiresMainQueueSetup() -> Bool { return true } // true, if the module needs to be initialized on the main thread
    @objc(recognize:withOptions:withResolver:withRejecter:)
    func recognize(imgPath: String, options: [String: Any], resolve: @escaping RCTPromiseResolveBlock, reject: @escaping RCTPromiseRejectBlock) {
        guard !imgPath.isEmpty else { reject("ERR", "You must include the image path", nil); return }

        let formattedImgPath = imgPath.stripPrefix("file://")
        var threshold: Float = 0.0

        var languages = ["en-US"]
        var autoDetectLanguage = true
        var customWords: [String] = []
        var usesLanguageCorrection = false
        var recognitionLevel: VNRequestTextRecognitionLevel = .accurate

        if let ignoreThreshold = options["visionIgnoreThreshold"] as? Float, !ignoreThreshold.isZero {
            threshold = ignoreThreshold
        }

        if let automaticallyDetectLanguage = options["automaticallyDetectLanguage"] as? Bool {
            autoDetectLanguage = automaticallyDetectLanguage
        }

        if let recognitionLanguages = options["recognitionLanguages"] as? [String] {
            languages = recognitionLanguages
        }

        if let words = options["customWords"] as? [String] {
            customWords = words
        }

        if let usesCorrection = options["usesLanguageCorrection"] as? Bool {
            usesLanguageCorrection = usesCorrection
        }

        if let level = options["recognitionLevel"] as? String, level == "fast" {
            recognitionLevel = .fast
        }

        do {
            let imgData = try Data(contentsOf: URL(fileURLWithPath: formattedImgPath))
            let image = UIImage(data: imgData)

            guard let cgImage = image?.cgImage else { return }

            let requestHandler = VNImageRequestHandler(cgImage: cgImage)

            let ocrRequest = VNRecognizeTextRequest { (request: VNRequest, error: Error?) in
                self.recognizeTextHandler(request: request, threshold: threshold, error: error, resolve: resolve, reject: reject)
            }

            /* Revision 3, .accurate, iOS 16 and higher
             ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant", "yue-Hans", "yue-Hant", "ko-KR", "ja-JP", "ru-RU", "uk-UA"]
             */

            /* Revision 3, .fast, iOS 16 and higher
             ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR"]
             */

            /* Revision 2, .accurate, iOS 14 and higher
             ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant"]
             */

            /* Revision 2, .fast iOS, 14 and higher
             ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR"]
             */

            if #available(iOS 16.0, *) {
                if autoDetectLanguage {
                    ocrRequest.automaticallyDetectsLanguage = true
                } else {
                    ocrRequest.recognitionLanguages = languages
                }
            } else {
                ocrRequest.recognitionLanguages = languages
            }

            ocrRequest.customWords = customWords
            ocrRequest.usesLanguageCorrection = usesLanguageCorrection
            ocrRequest.recognitionLevel = recognitionLevel

            try requestHandler.perform([ocrRequest])
        } catch {
            print(error)
            reject("ERR", error.localizedDescription, nil)
        }
    }

    func recognizeTextHandler(request: VNRequest, threshold: Float, error _: Error?, resolve: @escaping RCTPromiseResolveBlock, reject: @escaping RCTPromiseRejectBlock) {
        guard let observations = request.results as? [VNRecognizedTextObservation] else { reject("ERR", "No text recognized.", nil); return }

        let recognizedStrings = observations.compactMap { observation -> [String: Any]? in
            guard let topCandidate = observation.topCandidates(1).first,
                  topCandidate.confidence >= threshold else { return nil }

            let recognizedText = topCandidate.string

            _ = NLLanguageRecognizer()

            let language = NLLanguageRecognizer.dominantLanguage(for: recognizedText)

            let languageCode = language?.rawValue

            return ["text": recognizedText as String,
                    "languageCode": languageCode ?? "unknown" as String,
                    "confidence": topCandidate.confidence as Any,
                    //"x": observation.boundingBox.origin.x as Any,
                    //"y": observation.boundingBox.origin.y as Any,
                     "leftX": observation.boundingBox.minX as Any,
                     "middleX": observation.boundingBox.midX as Any,
                     "rightX": observation.boundingBox.maxX as Any,
                     "bottomY": 1 - observation.boundingBox.minY as Any,
                     "middleY": 1 - observation.boundingBox.midY as Any,
                     "topY": 1 - observation.boundingBox.maxY as Any,
                     "width": observation.boundingBox.width as Any,
                     "height": observation.boundingBox.height as Any,
            ]
        }

        // Debug
        // print(recognizedStrings)
        resolve(recognizedStrings)
    }
}

-- index.tsx --

import { NativeModules } from 'react-native';

export type ResultBlock = {
  text: string // "Hello World", "μ•ˆλ…•ν•˜μ„Έμš” 세계", etc
  languageCode: string // e.g. "en-US", "ko-KR", etc
  leftX: number
  middleX: number
  rightX: number
  bottomY: number
  middleY: number
  topY: number
  width: number
  height: number
}

export type TextRecognitionResult = ResultBlock[]

export type TextRecognitionOptions = {
  visionIgnoreThreshold?: number,
  automaticallyDetectLanguage?: boolean,
  recognitionLanguages?: SupportedLanuages[],
  useLanguageCorrection?: boolean,
  recognitionLevel?: RecognitionLevel,
  customWords?: string[],
};

type TextRecognitionType = {
  recognize(
    imagePath: string,
    options?: TextRecognitionOptions
  ): Promise<TextRecognitionResult[]>;
};

const { TextRecognition } = NativeModules;

/**
 * @iOS16 and higher: Revision 3 .accurate
 * ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant", "yue-Hans", "yue-Hant", "ko-KR", "ja-JP", "ru-RU", "uk-UA"]
 *
 * @iOS16 and higher: Revision 3 .fast
 * ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR"]
 *
 * @iOS14 and higher: Revision 2 .accurate
 * ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant"]
 *
 * @iOS14 and higher: Revision 2 .fast
 *  ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR"
 *
 */
type SupportedLanuages = "en-US" | "fr-FR" | "it-IT" | "de-DE" | "es-ES" | "pt-BR" | "zh-Hans" | "zh-Hant" | "yue-Hans" | "yue-Hant" | "ko-KR" | "ja-JP" | "ru-RU" | "uk-UA"
type RecognitionLevel = "fast" | "accurate";

async function recognize(
  imagePath: string,
  options?: TextRecognitionOptions
): Promise<TextRecognitionResult[]> {
  return await TextRecognition.recognize(imagePath, options || {});
}

export default { recognize } as TextRecognitionType;
spetrey commented 1 year ago

@ElicaInc thanks! So far this is looking great. Adding the observation.boundingBox support would be next-level. I'll take a closer look this weekend and can push these proposed changes to this branch πŸ‘

ElicaInc commented 1 year ago

I'm interested in incorporating a function to obtain image properties like height and width. Would you mind taking a look at the code and letting me know your opinion?

TextRecognition.swift

@objc(recognize:withOptions:withResolver:withRejecter:)
func recognize(imgPath: String, options: [String: Any], resolve: @escaping RCTPromiseResolveBlock, reject: @escaping RCTPromiseRejectBlock) {
    guard !imgPath.isEmpty else { reject("ERR", "You must include the image path", nil); return }

    β€’β€’β€’β€’β€’

    do {
        β€’β€’β€’β€’β€’

        // Get the image properties
        let imageSource = CGImageSourceCreateWithData(imgData as CFData, nil)
        let imageProperties = CGImageSourceCopyPropertiesAtIndex(imageSource!, 0, nil)

        β€’β€’β€’β€’β€’

        let ocrRequest = VNRecognizeTextRequest { (request: VNRequest, error: Error?) in
            self.recognizeTextHandler(request: request, threshold: threshold, error: error, imageProperties: imageProperties, resolve: resolve, reject: reject)

        β€’β€’β€’β€’β€’

        try requestHandler.perform([ocrRequest])
    } catch {

        β€’β€’β€’β€’β€’

    }
}

func recognizeTextHandler(request: VNRequest, threshold: Float, error _: Error?, imageProperties: CFDictionary?, resolve: @escaping RCTPromiseResolveBlock, reject: @escaping RCTPromiseRejectBlock) {
    guard let observations = request.results as? [VNRecognizedTextObservation] else { reject("ERR", "No text recognized.", nil); return }

    let recognizedStrings = observations.compactMap { observation -> [String: Any]? in

 β€’β€’β€’β€’β€’

    }

    // get textProperties and properties from image
    let recognizedImageProperties = imageProperties as! [String: Any]
    resolve(["stringProperties": recognizedStrings as Any, "imageProperties": recognizedImageProperties as Any])

}

index.tsx

β€’β€’β€’β€’β€’
export type StringProperties = {
  text: string // "Hello World", "μ•ˆλ…•ν•˜μ„Έμš” 세계", etc
  languageCode: string // e.g. "en-US", "ko-KR", etc
  confidence: number
  leftX: number
  middleX: number
  rightX: number
  bottomY: number
  middleY: number
  topY: number
  width: number
  height: number
}

export type ImageProperties = {
  ColorModel: string
  Depth: number
  Orientation: number
  PixelHeight: number
  PixelWidth: number
  ProfileName: string
  "{Exif}": Exif
  "{JFIF}": Jfif
  "{TIFF}": Tiff
  //{"ColorModel": "RGB", "Depth": 8, "Orientation": 1, "PixelHeight": 3769, "PixelWidth": 1710, "ProfileName": "sRGB IEC61966-2.1", "{Exif}": {"ColorSpace": 1, "PixelXDimension": 1710, "PixelYDimension": 3769}, "{JFIF}": {"DensityUnit": 0, "JFIFVersion": [1, 0, 1], "XDensity": 72, "YDensity": 72}, "{TIFF}": {"Orientation": 1}}
}

export type Exif = {
  ColorSpace: number
  PixelXDimension: number
  PixelYDimension: number
}
export type Jfif = {
  DensityUnit: number
  JFIFVersion: number[]
  XDensity: number
  YDensity: number
}

export type Tiff = {
  Orientation: number
}

export type TextRecognitionResult = {stringProperties: StringProperties[], imageProperties: ImageProperties}
β€’β€’β€’β€’β€’
spetrey commented 1 year ago

@ElicaInc I tested out textrecog output on some PNGs screenshots from Duolingo. While I was at it, I added support for those PNG properties in 95c7f9356b489dffa68e1d6d6525f840f397fe48 output looks pretty sweet. Great recommendations separating out the imageProperties and stringProperties:

[
    {
        "imageProperties": {
            "{Exif}": {
                "PixelXDimension": 1500,
                "PixelYDimension": 1500
            },
            "Depth": 8,
            "{TIFF}": {
                "XResolution": 216,
                "YResolution": 216,
                "ResolutionUnit": 2
            },
            "{PNG}": {
                "InterlaceType": 0,
                "XPixelsPerMeter": 8504,
                "YPixelsPerMeter": 8504
            },
            "ProfileName": "Display P3",
            "ColorModel": "RGB",
            "PixelHeight": 1500,
            "HasAlpha": true,
            "DPIWidth": 216,
            "PixelWidth": 1500,
            "DPIHeight": 216
        },
        "stringProperties": [
            {
                "leftX": 0.11206896603107452,
                "rightX": 0.5086206793785095,
                "confidence": 0.5,
                "bottomY": 0.36000001430511475,
                "width": 0.3965517282485962,
                "middleY": 0.32762932777404785,
                "languageCode": "ko",
                "topY": 0.29525861144065857,
                "middleX": 0.3103448152542114,
                "height": 0.06474137306213379,
                "text": "제 μŒμ•…μ€ μ΄λž˜μš”."
            },
            {
                "width": 0.4418103098869324,
                "rightX": 0.5603448152542114,
                "text": "My music is like this.",
                "middleX": 0.33943966031074524,
                "middleY": 0.4159482717514038,
                "languageCode": "en",
                "topY": 0.3943965435028076,
                "height": 0.04310344532132149,
                "leftX": 0.11853450536727905,
                "bottomY": 0.4375,
                "confidence": 0.5
            },
            {
                "rightX": 0.31494826078414917,
                "leftX": 0.04927587881684303,
                "width": 0.26567238569259644,
                "topY": 0.8844611644744873,
                "bottomY": 0.9474353790283203,
                "confidence": 1,
                "middleX": 0.18211206793785095,
                "languageCode": "it",
                "middleY": 0.9159482717514038,
                "text": "duolingo",
                "height": 0.06297419965267181
            }
        ]
    }
]
ElicaInc commented 1 year ago

V1.0.0: Output: resolve(recognizedStrings)

The version being requesting: Output: resolve(["stringProperties": recognizedStrings as Any, "imageProperties": recognizedImageProperties as Any])

If we make this change, it might not be compatible with the old version or Android anymore. I am not currently working on Android, and I'm not sure if this change is feasible in Android. Maintaining compatibility is crucial, so we should proceed with caution. I respect @JoeyEamigh. That was a great discussion, and I think we should bookmark these articles and return to point #11 at this time. What are your thoughts on this? Do you have any concerns?

spetrey commented 1 year ago

If we make this change, it might not be compatible with the old version or Android anymore. I am not currently working on Android, and I'm not sure if this change is feasible in Android. Maintaining compatibility is crucial, so we should proceed with caution.

A fair point. I'm not sure if this is feasible to support on Android as well. If we want to go back to 1.0.0-compatible output, we simply need to revert f5bcaab and 95c7f93.

JoeyEamigh commented 1 year ago

hey all!! thank you for all your work! sorry for being missing for a bit (exam szn) - going to take a look at this PR and attempt to decide what to do. I am not opposed to a breaking change for this level of value-add, however maintaining Android compatibility is crucial.

Apple's VisionKit has gotten better since I first wrote this library - I ended up using the ml branch (Google APIs for all) in my personal project since the output between the iOS version of Google MLKit and Android version was much more consistent.

that being said, it seems the appeal of the library is the fact that each are using native APIs. I will continue to keep them compatible, but I do think these features can coexist between libraries with a bit of work.

ElicaInc commented 1 year ago

I hope this library helps even more people get their creative juices flowing!

JoeyEamigh commented 1 year ago

ok, i absorbed this PR into the v2-exp branch. iOS is working well, but my Android test device currently doesn't have Google Services installed, so I can't test if the android side is working. it should be, but not sure. Either of y'all have a way to test it?

spetrey commented 1 year ago

Hey @JoeyEamigh no trouble on the delay! Totally understandable. Exams should take priority ☺️

iOS is working well, but my Android test device currently doesn't have Google Services installed, so I can't test if the android side is working. it should be, but not sure. Either of y'all have a way to test it?

Nice! Unfortunately, I don't own any Android devices to test the v2-exp branch out. I wish I could help here.

ElicaInc commented 1 year ago

ok, i absorbed this PR into the [v2-exp]

Super! I also don't have any Android devices at the moment, so I won't be able to test it out either. But I hope you find someone who can help you with the testing.

ElicaInc commented 1 year ago

I encountered an error while installing the library. The error message reads: TypeError: Cannot read property 'recognize' of undefined.

What I did: To install the library, I made three adjustments. Firstly, I added pod 'react-native-text-recognition', :path => '../node_modules/react-native-text-recognition’ to my Podfile. Secondly, I cloned the library's repository using the v2-exp branch and saved it to /path/to/myApp/node_modules/react-native-text-recognition. Finally, I ran pod install to finalize the installation.

Terminal: β€” Downloading dependencies Installing react-native-text-recognition 0.1.0 (was 1.0.0) Generating Pods project β€”

I'm unsure if the library has any additional requirements that would require extra steps during the installation process. I'm not very experienced with React Native and Swift. Can you provide guidance on how to install the v2-exp branch?

JoeyEamigh commented 1 year ago

Forgot to bump the version - whoops. I also may have changed the way it is exported. Check the example/ directory on the v2-exp branch to see how it's doing the import maybe?

ElicaInc commented 1 year ago

Works fine! Thank you so much! I tested it on a fresh installation and it's working perfectly. The error was caused by my own configuration settings. I'm sorry for inconveniencing you.

spetrey commented 1 year ago

Ditto @JoeyEamigh! The v2-exp branch seems be working as expected on my end πŸ‘