a7medev / react-native-ml-kit

React Native On-Device Machine Learning w/ Google ML Kit
MIT License
425 stars 60 forks source link

Frame positioning for IOS is messed up #59

Open hurnell opened 3 weeks ago

hurnell commented 3 weeks ago

What happened?

Turns out frame calculation of OCR for photos on IOS is messed up in @react-native-ml-kit/text-recognition.

Have found that the frame is misinterpreted and should be:

top: is misinterpreted as left height and width are switched. and left: would be screen width - their top - their height.

Version

@react-native-ml-kit/barcode-scanning: version@react-native-ml-kit/face-detection: version@react-native-ml-kit/identify-languages: version@react-native-ml-kit/image-labeling: version@react-native-ml-kit/text-recognition: version@react-native-ml-kit/translate-text: version

Which ML Kit packages do you use?

What platforms are you seeing this issue on?

System Information

RN 0.73.6 System: OS: macOS 14.6.1 CPU: (8) arm64 Apple M3 Memory: 84.88 MB / 16.00 GB Shell: version: "5.9" path: /bin/zsh Binaries: Node: version: 18.20.0 path: ~/.nvm/versions/node/v18.20.0/bin/node Yarn: version: 1.22.22 path: /opt/homebrew/bin/yarn npm: version: 10.5.0 path: ~/.nvm/versions/node/v18.20.0/bin/npm Watchman: version: 2024.09.02.00 path: /opt/homebrew/bin/watchman Managers: CocoaPods: version: 1.15.2 path: /opt/homebrew/bin/pod SDKs: iOS SDK: Platforms:

Steps to Reproduce

const {blocks} = await TextRecognition.recognize(`file://${photo.path}`);

expand lines then filter blocks somehow. (or don't)

use "react-native-fs"

to copy original image to somewhere on phone and save json encoded blocks to json file on phone.

Transfer files to PC.

Use small html/js/css file to position frames on top of original image.

badDiv.style.cssText = `top: ${value.frame.top}px; left: ${value.frame.left}px; height: ${value.frame.height}px; width: ${value.frame.width}px`;

goodDiv.style.cssText = `top: ${value.frame.left}px; right: ${value.frame.top}px; height: ${value.frame.width}px; width: ${value.frame.height}px`;

Screenshot 2024-11-05 at 08 43 01

The red bordered blocks (badDiv) are messed up.

The green bordered blocks (goodDiv) with the properties switched around are in the correct positions.

a7medev commented 1 day ago

@hurnell The values returned are the same as the ones provided by Google's ML Kit Text Recognition iOS framework so there's not much to change from my side. But my assumption is that it's because of the image's orientation (the image is in landscape and looks like ML Kit analyzed the image in portrait mode). The orientation is detected automatically using UIImage.imageOrientation.