a7medev / react-native-ml-kit

React Native On-Device Machine Learning w/ Google ML Kit
MIT License
396 stars 58 forks source link

Object detection implementation for single image - iOS and android #21

Open andrewjb123 opened 1 year ago

lennartschoch commented 10 months ago

Hey @a7medev, I'd love to see this feature in the project :) Is there any way we can move this PR forward? any way I can help? Would be highly appreciated🙏

Ravim-addweb commented 3 months ago

Hi @andrewjb123 - thanks for taking time and writing the module to detect the object. I implemented that in my RN project.

I went ahead and added custom object detection method and passed my model.

However for both object detection and custom object detection cases, I've seen that the coordinates given by the algorithm of detected objects are pretty odd. Basically, I want to superimpose a sticker/video on the given coordinates when object is detected. So with my custom tensorflow model, it detects the object correctly but the coordinates are way off. Could you give it a shot (for default object detection; as I was getting weird coordinates for that as well) please? Thanks a ton.

andrewjb123 commented 3 months ago

Hi, a couple of things you can check,

Ravim-addweb commented 3 months ago

@andrewjb123 thanks for quick response, appreciate it.

Is there a way to get the X and Y Coordinates for the scanned image? Basically, I am looking forward to build something like this, kindly check.

Android doc reference which says the return type will be Rect.

In my local application, I am able to detect the Photo object using my custom Tensorflow model. But I am getting Rect as response from the CustomObjectDetection SDK. I would like to have XY coordinates of all 4 pointers, then only I will be able to superimpose the video, right now my video seems like it's a sticker.

Thanks.

andrewjb123 commented 3 months ago

Hi,

the interface is giving you a rect, framex, framey, frameWidth, frameHeight.

what I’m suggesting is that your display image component (e.g <Image/> in your react native app is doing some form of scaling, centering or rotation which you are unable to see, or haven’t turned off, so for example you are giving an image of 1024x1024 to the object detection library, and it’s giving you the correct x, y, width, height of the rect for the image dimensions of 1024x1024, but your component library has scaled your image to say 512x512 (for example) so in your case the rect will be positioned incorrectly on your 512x512 scaled image unless you appropriately divide x,y,width,height by 2 to compensate for the scale change on display in the image component.

if you can give the test app a try and if it’s not working on that please upload the image you are using and the tensor model and I can take a look.

andrewjb123 commented 3 months ago

I’ve taken a Quick Look at the video you posted, I think you’re probably using the wrong technology to achieve video overlay which tracks a placeholder being moved by a hand.

These are the steps I think you are trying to achieve:

For this you would better using an augmented reality library like react-viro

https://youtu.be/Waqb0zTMSDY?si=-FcGtWZNr9kXJHki

https://youtu.be/2pGCnipzl3c?si=mNue84X3asBFW3NM

If you used imageMarker and video components from that library I think it would achieve the desired effect you want.

https://viro-community.readme.io/docs/image-recognition

https://viro-community.readme.io/docs/video

I’ve provided you with some links to the components you could use

Ravim-addweb commented 3 months ago

Hi @andrewjb123 sorry for late reply.

I actually am able to detect the CustomObject by taking inspiration from your DetectObject module you wrote for Android. It gives me Rect back, now I am using another model to detect coordinates of detected object using tensorflow and training my model to use both image and annotations.

I will inform you once I am near to any solution. Thanks again for answering. Appreciate it!

andrewjb123 commented 2 months ago

Appreciate you may use what I’ve provided but you’ll never get the performance you need using it by using a bridge combined with and image as input, the interface provided will only allow single images and on mobile device won’t be fast enough for realtime processing doing what you’re trying to do.

Ravim-addweb commented 2 months ago

Hey, thanks for getting back. I have decided to go with a 3rd party service called Vuforia which accurately detects cloud image target and you can nicely overlay image, video or 3D augmentation on it. As neither Google's SDK or any other solution didn't give me the accurate XY coordinates of the image, we had to follow the remote solution. Thanks.

BoavistaLudwig commented 2 months ago

@a7medev Is there any way we can move this PR forward?