feat: Add VisionCamera integration (`imageFromFrame`)

facebookresearch / playtorch

PlayTorch is a framework for rapidly creating mobile AI experiences.

MIT License

830 stars 101 forks source link

Summary

This makes it possible to use Models (like face detection, object classification, etc) inside a VisionCamera Frame Processor straight from JS without touching native code at all (no native Frame Processor Plugins!) 🚀

cc @raedle

Still WIP - proof of concept. iOS has a small memory issue and Android isn't tested yet.

WIP - Current TODOS:

Figure out how to get Context in the Android implementation. We're in a static method right now.

Figure out how we want to link VisionCamera on Android. We probably have to add a if include in the build.gradle + CMake setup to require it as a prefab if the user has VisionCamera installed.

Figure out why it doesn't delete the image after I call release() - this blocks the Camera pipeline

Test it :)

Test Plan

EDIT: I got it working!

This is the code I used:

const frameProcessor = useFrameProcessor((frame) => { 'worklet'; console.log(`Frame width: ${frame.width}`) const image = media.imageFromFrame(frame) console.log(`Converted! Image width ${image.getWidth()}`) // we can run some ML Models here :) image.release() }, []); return <Camera frameProcessor={frameProcessor} />

This runs for ~10 times (max buffer size), but then stops because the image.release() call doesn't properly release all resources. I have no idea how ref counting works in this repo, so I'd appreciate some pointers here?

But in theory it works, so this could be a pretty cool integration :)

Name	Status	Preview	Comments	Updated
playtorch	❌ Failed (Inspect)			Feb 21, 2023 at 5:03AM (UTC)

Name

Status

Preview

Comments

Updated

playtorch

❌ Failed (Inspect)

Feb 21, 2023 at 5:03AM (UTC)

That's pretty exciting!

I solved the memory leak--at least partially because there seems to be other resources that are not deallocated.

There were two issues that I fixed, and a BE to cleanup eventual legacy code:

The JavaScript GC isn't kicking in fast or frequent enough to cleanup unreferenced host objects. I made a change that sets nullptr for the Image on the ImageHostObject letting the smart pointer to its thing.
Same as above for BlobHostObject and a release equivalent for the TensorHostObject.
Not necessarily a big issue, but cleaner. The first release of PlayTorch (v0.1) sent "image references" across the React Native bridge (in the codebase, it's called NativeJSRef). Basically, images where put in a hash map with a UUID as key and the image object as value. This allowed just sending the UUID, which then could be used in JavaScript to refer to a native object when calling functions on the native object or draw it on the canvas. For backward compatibility, PlayTorch v0.2 was additive and made new API compatible with the NativeJSRef. A possible future with react-native-vision-camera and react-native-skia make the PlayTorch Camera and Canvas obsolete and will allow removing the NativeJSRef entirely.

Answering TODOs:

Ideally, there is no reference to the ApplicationContext needed. This will require removing it from the ImageProxyImage constructor and eventually do the yuv420ToBitmap conversion in C++ directly instead of a RenderScript.
If it simplifies the approach, we can probably upgrade to RN 0.71 w/o backward compatibility
See above
👍

this could be a pretty cool integration :)

Agree!

Additional TODOs:

I got a step further converting the image to a tensor. However, there is still a bottleneck somewhere that drops the frame rate from 60 to 20fps. My hunch is that there are unnecessary memcpy and conversions. To be investigated.
I tried to load an image classification model, but I'm not sure how to (1) download the model async, (2) load it into memory with torch.jit._loadForMobile or torch.jit._loadForMobileSync, and then call its forwardSync or forward function in the useFameProcessor hook

// ...

import { torch, media, torchvision } from 'react-native-pytorch-core';
import type { Module, Tensor } from 'react-native-pytorch-core';

const T = torchvision.transforms;
const resizeTensor = T.resize([224, 224])
const normalizeTensor = T.normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]);

// ...

const countRef = useRef(0);

// ...

const frameProcessor = useFrameProcessor((frame) => {
  'worklet';

  // Increasing counter to see at what frame the frame processor
  // stops processing
  countRef.current += 1;

  console.log(`Width: ${frame.width}/${frame.height}`);
  const imageHighRes = media.imageFromFrame(frame);
  const image = imageHighRes.scale(0.25, 0.25);
  imageHighRes.release();
  const width = image.getWidth();
  const height = image.getHeight();
  console.log(`Converted ${countRef.current}! ${width}/${height}`);

  const blob = media.toBlob(image);
  let tensor = torch.fromBlob(blob, [height, width, 3]);
  image.release();
  blob.release();

  // Helper function to release input tensor before returning newly
  // constructed tensor. This is for testing purposes, and will need
  // to change to an API transparent to the developer.
  function applyAndFreeTensor(inputTensor: Tensor, func: (tensor: Tensor) => Tensor): Tensor {
    const newTensor = func(inputTensor);
    inputTensor.release();
    return newTensor;
  }

  tensor = applyAndFreeTensor(tensor, (tensor) => tensor.permute([2, 0, 1]));
  tensor = applyAndFreeTensor(tensor, (tensor) => tensor.div(255));
  tensor = applyAndFreeTensor(tensor, (tensor) => {
    const centerCrop = T.centerCrop(Math.min(width, height));
    return centerCrop(tensor);
  });
  tensor = applyAndFreeTensor(tensor, resizeTensor);
  tensor = applyAndFreeTensor(tensor, normalizeTensor);
  tensor = applyAndFreeTensor(tensor, tensor => tensor.unsqueeze(0));

  console.log('shape', tensor.shape);

  // // How to load "model" (i.e., ModuleHostObject)?
  // const output = model.forwardSync<[Tensor], Tensor>(tensor);
  // console.log('output', output.shape);
  // output.release();

  tensor.release();
}, [countRef]);

Most on-device models use a relatively low input image. I worked around this by scaleing the image down. Is there a way to pick a lower resolution on the camera directly?

Thanks for your feedback and rapid change in the GC @raedle!

If it simplifies the approach, we can probably upgrade to RN 0.71 w/o backward compatibility

Yes, VisionCamera V3 will require RN 0.71 due to the much simpler buildscript.

I got a step further converting the image to a tensor. However, there is still a bottleneck somewhere that drops the frame rate from 60 to 20fps. My hunch is that there are unnecessary memcpy and conversions. To be investigated.

Yea, I mean CMSampleBuffer itself is a GPU buffer, so converting that to a UIImage copies it to the CPU which can be slow. Is there a way to create a Tensor purely from a GPU buffer? Not entirely sure how PyTorch works here under the hood...

I tried to load an image classification model, but I'm not sure how to (1) download the model async, (2) load it into memory with torch.jit._loadForMobile or torch.jit._loadForMobileSync, and then call its forwardSync or forward function in the useFameProcessor hook

I think this fully relies on the understanding of Worklets.

We want to move as much stuff outside of useFrameProcessor as possible, as this is the hot code path (called for every frame).

So loading the model (asynchronous) has to be done outside the Worklet.

Also, Worklets have some limitations;

You can use "outside" values inside a Worklet, but if you also want to write to them it has to be a SharedValue (latest RN Worklets version introduced the useSharedValue hook, check it out! Also, don't use REA's useSharedValue here.)
You cannot use async/await inside a Worklet.
You cannot call JS functions inside the Worklet. It has to be either a C++ JSI function (HostObject/HostFunction), or another Worklet (aka a JS function with the "worklet" directive). If you want to call back to JS, use Worklets.createRunInJsFn to wrap the JS func outside, and then call that wrapped func inside the Worklet.
Callbacks are a bit tricky. If they go straight to a C++ JSI func (HostObject/HostFunction) it shouldn't be a problem, if it goes to a Worklet it shouldn't be much of a problem either. We'd have to test that

Looking at your code:

useRef

Needs to be useSharedValue from react-native-worklets.

const output = model.forwardSync<[Tensor], Tensor>(tensor);

Is that sync? If it's sync it should work. If it's async/awaitable, this is not gonna work and shouldn't be part of useFrameProcessor as that gets called ever frame.

Also, in the latest RN Worklets lib we made some fixes to identify HostObjects correctly- so maybe try the latest commit fro master instead of the current pinned version :)

facebookresearch / playtorch

feat: Add VisionCamera integration (`imageFromFrame`) #199

Summary

Changelog

Test Plan