[Need help] Optimizing Legacy Holistic in Worker Thread

hohoaisan commented 8 months ago

Currently I'm making a simple Live2D Mocap tool that runs only on web.

I was able to make the Mocap feature works just fine, but it is blocking main thread, so whenever Motion tracking feature is enabled, the UI was very laggy

I have followed this discussion and able to move the holistic code to worker thread, FPS on main thread now fully 60 https://github.com/google/mediapipe/issues/2506

Below is my simple implementation: (Full draft code)

// worker.js
let ttttt = new Date().getTime();
self.importScripts("holistic.js");

let holistic = null;
let hModelInit = false;
async function init() {
  hModel = new Holistic({
    locateFile: function (file) {
      const path = `/holistic/${file}`;
      return path;
    }
  });
  hModel.setOptions({
    modelComplexity: 0,
    smoothLandmarks: true,
    minDetectionConfidence: 0.5,
    minTrackingConfidence: 0.5,
    refineFaceLandmarks: true,
  });
  hModel.onResults(function (results) {
    postMessage(JSON.stringify({
      time: new Date().getTime(),
      faceLandmarks: results.faceLandmarks,
      poseLandmarks: results.poseLandmarks,
      za: results.za
    }))

  });
  await hModel.initialize();
  console.log("holistic worker initialization!");
  hModelInit = true;
}
init();

onmessage = async e => {
  if (hModelInit && e.data) {

    const timestamp = performance.now()
    const now = Date.now()
    const prev = e.data.now;
    const image = e.data.bitmap
    console.log("delayed message sent to worker: ", now - prev);
    await hModel.send({ image }, timestamp);
    const timestamp2 = performance.now()
    console.log("predict time: ", timestamp2 - timestamp);
  }
}

// live2d.ts

const defaultWidth = 320;
const defaultHeight = 240;

const holisticWorker = new Worker('/holistic/worker.js');

// Camera source putting stream into videoElement
const videoElement = document.createElement('video');

// Camera class is a modified version of @mediapipe/camera_utils with manual FPS set
const camera = new Camera(videoElement, {
  // onFrame fires at 24 times per sec when camera is active
  onFrame: async () => {
    const now = Date.now();
    createImageBitmap(videoElement).then((bitmap) => {
      holisticWorker.postMessage({ now, bitmap }, [bitmap]); // transferable
    });
  },
  frameRate: 24,
  width: defaultWidth,
  height: defaultHeight,
});

camera.start()

holisticWorker.onmessage = (e) => {
// do sth with mocap
}

But I came with another issue, the Mocap in worker thread is quite slow compared to what I was build on main, while debugging what is slowing the code

I saw messaging delay between main and worker gradually increase overtime

Although I have tried to reduce frame rate (which is bad), and trying to decrease videoElement width/height, model complexity to lite, the delay is still very noticeable.

However, XR_Animator using Holistic Legacy does not have this issue, and the time between message transfering is quite fast (less than 100ms), I have looked into SA_system_emulation.min.js and it does use the same createImageBitmap which I was using above, even keep original resolution 1280x720.

Is there any step that I was wrong with my simple implementation above? I even copied the same holistic model that XRAnimator is using.

ButzYung commented 8 months ago

When you want to send a message from the main thread to the worker, make sure that worker thread has finished any existing task and is idle before actually sending the message. Judging from your code, you are sending a message to worker unconditionally at a constant 24 fps (cameras fps), but there is no guarantee that the worker thread can process the holistic processing within one frame.

hohoaisan commented 8 months ago

Thanks! I was able to resolve the delayed message of the worker by introducing running state. Now the framerate is better.

But it still drops frequently cuz Holistic take from 80ms to 120ms to inference (8 - 12 FPS), which feels very laggy when the Live2D move its head/body. While the 3D Model of XR Animator feels very smooth with the same inference frame rate. Any idea of overcome this issue?

ButzYung commented 8 months ago

What is the spec of your PC? Holistic should run above 15fps on an average PC.

XR Animator does frame interpolation, so even if the detection fps is just 10fps, the actual animation will still look smooth. You can use the detection fps to determine how much you should interpolate the animation frames.

hohoaisan commented 8 months ago

Mine is i5-7300HQ 2.5GHz (4CPUs) with Intel HD Graphic 630 and Nvidia Mobile GTX 1050, MoCap holistic seems to run only on Intel GPU.

My Live2D playground was deployed here https://live2d-playground-git-feature-holistic-worker-hohoaisan.vercel.app/

Is there any direction about frame interpolation? Thank you!

ButzYung commented 8 months ago

Unfortunately you can't decide which GPU Holitsic (or any WebGL app in general) uses in normal web browser.

Let's say Holitsic runs at 10fps and animation runs at 60fps. In between the animation frame, you interpolate frame data (be it a float value, vector position or quaternion rotation) between the previous one and the current one, by using lerp function.

hohoaisan commented 8 months ago

I have tried using Kalidokit's lerp and not much improved result. So I decided to give up Holistic and use Facemesh + Movenet combined, pretty good inference time!

ButzYung / SystemAnimatorOnline

[Need help] Optimizing Legacy Holistic in Worker Thread #71