google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.78k stars 5.09k forks source link

facelandmarker vision_wasm_internal crush #5152

Closed sewonjun closed 3 months ago

sewonjun commented 6 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

M1 ios 14.3 Chrome 120.0.6099.109

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

face landmark detection

Programming Language and version (e.g. C++, Python, Java)

javascript

Describe the actual behavior

vision_wasm_internal Graph successfully started but can't draw face mask

Describe the expected behaviour

mediapipe face landmark detection working

Standalone code/steps you may have used to try to get what you need

When i was trying on localhost, it worked randomly. And when i tried it on my ip address host it didn't work. And also, it worked fine untill sat 17 feb 2024.

import vision from "@mediapipe/tasks-vision";
const { FaceLandmarker, FilesetResolver } = vision;

async function initFaceLandmarker() {
  const fileSetResolver = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );

  const faceLandmarkerInstance = await FaceLandmarker.createFromOptions(
    fileSetResolver,
    {
      baseOptions: {
        modelAssetPath:
          "https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task",
        delegate: "GPU",
      },
      outputFaceBlendshapes: true,
      runningMode: "VIDEO",
      numFaces: 1,
    }
  );

  return faceLandmarkerInstance;
}

export default initFaceLandmarker;
import { useEffect, useRef, useState } from "react";
import { FaceLandmarker, DrawingUtils } from "@mediapipe/tasks-vision";
import { v4 as uuidv4 } from "uuid";

import initMediaPipe from "../../../mediaPipe/initMediaPipe";
import predictHappiness from "../../../util/predictHappiness";
import drawFaceMask from "../../../util/drawFaceMask";
import CapturedImage from "../CapturedImage";
import LoadingBtn from "../LoadingBtn";
import emotionPredictionModel from "../../../util/emotionPredictionModel";

type Emotion = "happy" | "unhappy" | "neutral" | null;

interface FaceBlendShape {
  index: number;
  score: number;
  categoryName: string;
  displayName: string;
}

interface ImageRef {
  capturedPicture: string;
  faceBlendShape: FaceBlendShape[];
}

const FaceDetection = () => {
  const videoRef = useRef<HTMLVideoElement>(null);
  const canvasRef = useRef<HTMLCanvasElement>(null);
  const captureRef = useRef<HTMLCanvasElement>(null);
  const lastTime = useRef<number>(0);
  const imgRef = useRef<ImageRef[]>([]);
  const videoContainerRef = useRef<HTMLDivElement>(null);
  const [faceLandmarker, setFaceLandmarker] = useState<FaceLandmarker | null>(
    null
  );
  const [isMobile, setIsMobile] = useState<boolean>(false);
  const [webcamRunning, setWebcamRunning] = useState<boolean>(false);
  const [videoDetect, setVideoDetect] = useState<boolean>(false);
  const [animationId, setAnimationId] = useState<number | null>(null);
  const [errorMessage, setErrorMessage] = useState<string>("");
  const [model, setModel] = useState<unknown>(null);
  const [emotion, setEmotion] = useState<Emotion>(null);
  const [isLoading, setIsLoading] = useState<boolean | null>(null);
  let imgRefNumber = 0;
  const runningMode = "VIDEO";

  useEffect(() => {
    async function createFaceLandmarker() {
      const faceLandmarkerInstance = await initMediaPipe();
      setFaceLandmarker(faceLandmarkerInstance);
    }

    async function loadModel() {
      const model = await emotionPredictionModel();
      if (model) {
        setModel(model);
      }
    }

    createFaceLandmarker();
    loadModel();
  }, []);

  useEffect(() => {
    if (errorMessage) {
      window.scrollTo(0, 0);
    }
  }, [errorMessage]);

  useEffect(() => {
    const checkMobile = () => {
      const ua = navigator.userAgent;
      if (/Mobi|Android/i.test(ua)) {
        setIsMobile(true);
      }

      if (/iPhone/i.test(ua)) {
        setIsMobile(true);
      }
    };
    checkMobile();
    window.addEventListener("checkMobile", checkMobile);

    return () => window.removeEventListener("checkMobile", checkMobile);
  }, []);

  function handleWebCamRunning() {
    setWebcamRunning(prev => !prev);
  }

  function handleErrorBtn() {
    predictWebcam();
    setErrorMessage("");
  }

  async function handleFaceMask() {
    if (videoDetect && animationId) {
      window.cancelAnimationFrame(animationId);
      videoRef.current!.removeEventListener("loadeddata", predictWebcam);

      setWebcamRunning(false);
      setVideoDetect(false);
      setAnimationId(null);

      return;
    }

    if (faceLandmarker && !videoDetect) {
      setVideoDetect(true);

      if (navigator.mediaDevices) {
        enableCam();
      } else {
        alert("Camera is not supported on your device.");
      }
    }
  }

  async function enableCam() {
    if (!faceLandmarker) {
      alert("Wait! faceLandmarker not loaded yet.");
      return;
    }

    setIsLoading(true);

    const constraints = {
      video: true,
    };

    const openMediaDevices = async (constraint: MediaStreamConstraints) => {
      return await navigator.mediaDevices.getUserMedia(constraint);
    };

    try {
      const stream = await openMediaDevices(constraints);

      if (videoRef.current) {
        videoRef.current.srcObject = stream;
        videoRef.current.addEventListener("loadeddata", predictWebcam);
        videoRef.current.onloadeddata = () => {
          setIsLoading(false);
          videoRef.current!.play();
        };
      }
      if (runningMode === "VIDEO") {
        await faceLandmarker.setOptions({ runningMode: runningMode });
      }
    } catch (error) {
      setErrorMessage("Your device is not available for this service");
    }
  }

  async function predictWebcam() {
    const canvas = canvasRef.current;
    const video = videoRef.current;

    if (
      canvas === null ||
      webcamRunning === false ||
      video === null ||
      faceLandmarker === null
    )
      return;

    const videoRect = video.getBoundingClientRect();
    const startTimeMs = performance.now();
    const results = await faceLandmarker.detectForVideo(video, startTimeMs);

    if (!results.faceLandmarks.length) {
      setErrorMessage("Face Detection Failed");
    }

    canvas.setAttribute("width", videoRect.width.toString());
    canvas.setAttribute("height", videoRect.height.toString());
    canvas.style.width = videoRect.width + "px";
    canvas.style.height = videoRect.height + "px";
    const canvasCtx = canvas.getContext("2d");

    if (canvasCtx) {
      canvasCtx.clearRect(0, 0, canvas.width, canvas.height);
      const drawingUtils = new DrawingUtils(canvasCtx!);
      drawFaceMask(results, drawingUtils, FaceLandmarker);
    }

    const currentTime = performance.now();
    const delay = 500;

    if (!lastTime.current || currentTime - lastTime.current >= delay) {
      lastTime.current = currentTime;

      const capture = captureRef.current;
      const video = videoRef.current;
      const videoContainerRect =
        videoContainerRef.current?.getBoundingClientRect();

      if (capture && video) {
        captureRef.current.setAttribute("width", videoRect.width.toString());
        captureRef.current.setAttribute("height", videoRect.height.toString());
        captureRef.current.style.left = videoRect.x + "px";
        captureRef.current.style.top = videoRect.y + "px";
        captureRef.current.style.width = videoContainerRect!.width + "px";
        captureRef.current.style.height = videoContainerRect!.height + "px";
        const captureCtx = captureRef.current.getContext("2d");

        if (captureCtx) {
          captureCtx.clearRect(0, 0, canvas.width, canvas.height);
          captureCtx.drawImage(video, 0, 0, videoRect.width, videoRect.height);
        }

        capture.toBlob(async blob => {
          const faceBlendShape = results.faceBlendshapes[0]?.categories;
          const emotionResult = await predictHappiness(faceBlendShape, model);
          setEmotion(emotionResult);

          if (emotionResult === "happy" && blob !== null) {
            const capturedPicture = URL.createObjectURL(blob);

            imgRef.current[imgRefNumber] = {
              capturedPicture,
              faceBlendShape,
            };

            imgRefNumber++;
          }

          if (webcamRunning) {
            const animationFrameId =
              window.requestAnimationFrame(predictWebcam);
            setAnimationId(animationFrameId);
          }
        }, "image/png");
      }
    } else {
      if (webcamRunning) {
        const animationFrameId = window.requestAnimationFrame(predictWebcam);
        setAnimationId(animationFrameId);
      }
    }
  }

  return (
    <>
      {errorMessage ? (
        <div className="flex flex-col justify-center items-center mt-10 h-auto w-6/12 m-auto bg-red-500">
          <h1 className="text-base text-gray-50 decoration-solid mb-1">
            Error: {errorMessage}
          </h1>
          <button
            onClick={handleErrorBtn}
            type="button"
            className="text-gray-50 bg-black h-auto p-1 m-1"
          >
            Restart
          </button>
        </div>
      ) : (
        <></>
      )}
      {webcamRunning ? (
        <>
          <div className="flex flex-col h-screen items-center">
            <div className="flex grow-0 flex-row h-auto w-auto bg-stone-200 border-2 border-stone-900 ring-offset-0 p-2 m-1  rounded-3xl justify-around">
              <div
                className={`text-4xl p-3 m-2 ${
                  emotion === "unhappy" ? "bg-red-600 " : "bg-stone-300"
                } rounded-full border-4 border-stone-900 shadow-md`}
              >
                🙁
              </div>
              <div
                className={`text-4xl p-3 m-2 ${
                  emotion === "neutral"
                    ? "bg-yellow-400 shadow-md"
                    : "bg-stone-300"
                } rounded-full border-4 border-stone-900 shadow-md`}
              >
                😐
              </div>
              <div
                className={`text-4xl p-3 m-2 ${
                  emotion === "happy" ? "bg-lime-400" : "bg-stone-300"
                } rounded-full border-4 border-stone-900 stone-md`}
              >
                🙂
              </div>
            </div>
            {isLoading && (
              <div className="loading-container">Loading...</div> // 로딩 화면
            )}
            <div
              className={`
                flex flex-col justify-center items-center ${
                  isMobile ? "w-10/12" : "w-6/12"
                } h-4/5 border-2 max-w-md bg-stone-800`}
            >
              <div className="grid grid-rows-4 w-full h-full m-10">
                <div
                  className="relative block row-span-3"
                  ref={videoContainerRef}
                >
                  <video
                    ref={videoRef}
                    autoPlay
                    playsInline
                    className="absolute block w-full h-full"
                  ></video>
                  <canvas
                    ref={canvasRef}
                    className="absolute block w-full h-full"
                  />
                  <canvas ref={captureRef} className="hidden" />
                </div>
                <div className="block cursor-pointer text-center items-center row-span-1 justify-center my-5 py-5">
                  <button
                    type="button"
                    onClick={handleFaceMask}
                    className="cursor-pointer z-10 bg-amber-400 hover:bg-white hover:text-amber-400 text-white font-bold py-2 px-4 border rounded text-2xl"
                  >
                    {videoDetect ? "Stop" : "Start"}
                  </button>
                </div>
              </div>
            </div>
          </div>
        </>
      ) : (
        <LoadingBtn
          faceLandmarker={faceLandmarker}
          handleWebCamRunning={handleWebCamRunning}
        />
      )}
      <div className="flex justify-center align-middle text-center text-2xl py-5">
        {imgRef.current.length ? "Select one picture to make a polaroid" : ""}
      </div>
      <div className="flex flex-col justify-center align-middle text-center">
        {imgRef.current
          .slice(-5)
          .map(imgData =>
            imgData ? (
              <CapturedImage
                imgRefCurrent={imgData.capturedPicture}
                faceBlendShape={imgData.faceBlendShape}
                key={uuidv4()}
              />
            ) : (
              <></>
            )
          )}
      </div>
    </>
  );
};

export default FaceDetection;

Other info / Complete Logs

vision_wasm_internal.js:9 W0219 11:33:28.137000 1883440 face_landmarker_graph.cc:180] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
custom_dbg @ vision_wasm_internal.js:9
vision_wasm_internal.js:9 I0219 11:33:28.179000 1883440 gl_context.cc:361] GL version: 3.0 (OpenGL ES 3.0 (WebGL 2.0 (OpenGL ES 3.0 Chromium))), renderer: WebKit WebGL
vision_wasm_internal.js:9 W0219 11:33:28.180000 1883440 gl_context.cc:1004] OpenGL error checking is disabled
custom_dbg @ vision_wasm_internal.js:9
vision_wasm_internal.js:9 Graph successfully started running.
index-3b985e55.js:88 WebGL: INVALID_VALUE: texImage2D: no video
bindTextureToStream @ index-3b985e55.js:88
(익명) @ index-3b985e55.js:88
wrapStringPtr @ index-3b985e55.js:88
addGpuBufferAsImageToStream @ index-3b985e55.js:88
process @ index-3b985e55.js:88
processVideoData @ index-3b985e55.js:88
detectForVideo @ index-3b985e55.js:88
_t @ index-3b985e55.js:89
yt @ index-3b985e55.js:89
Nb @ index-3b985e55.js:37
Tb @ index-3b985e55.js:37
Ub @ index-3b985e55.js:37
nf @ index-3b985e55.js:37
se @ index-3b985e55.js:37
(익명) @ index-3b985e55.js:37
Rk @ index-3b985e55.js:40
Jb @ index-3b985e55.js:37
hd @ index-3b985e55.js:37
fd @ index-3b985e55.js:37
ed @ index-3b985e55.js:37
vision_wasm_internal.js:9 INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
put_char @ vision_wasm_internal.js:9
write @ vision_wasm_internal.js:9
write @ vision_wasm_internal.js:9
doWritev @ vision_wasm_internal.js:9
_fd_write @ vision_wasm_internal.js:9
$func14768 @ vision_wasm_internal.wasm:0x749391
$func3527 @ vision_wasm_internal.wasm:0x1cc1c0
$func4212 @ vision_wasm_internal.wasm:0x25cadc
$func1285 @ vision_wasm_internal.wasm:0x7cd25
$func6077 @ vision_wasm_internal.wasm:0x39f75d
$func12429 @ vision_wasm_internal.wasm:0x65db40
$func12043 @ vision_wasm_internal.wasm:0x612d02
$func10038 @ vision_wasm_internal.wasm:0x5664b9
$func3217 @ vision_wasm_internal.wasm:0x1a0233
$func3309 @ vision_wasm_internal.wasm:0x1b322a
$ze @ vision_wasm_internal.wasm:0x3feb38
Module._waitUntilIdle @ vision_wasm_internal.js:9
finishProcessing @ index-3b985e55.js:88
finishProcessing @ index-3b985e55.js:88
process @ index-3b985e55.js:88
processVideoData @ index-3b985e55.js:88
detectForVideo @ index-3b985e55.js:88
_t @ index-3b985e55.js:89
yt @ index-3b985e55.js:89
Nb @ index-3b985e55.js:37
Tb @ index-3b985e55.js:37
Ub @ index-3b985e55.js:37
nf @ index-3b985e55.js:37
se @ index-3b985e55.js:37
(익명) @ index-3b985e55.js:37
Rk @ index-3b985e55.js:40
Jb @ index-3b985e55.js:37
hd @ index-3b985e55.js:37
fd @ index-3b985e55.js:37
ed @ index-3b985e55.js:37
vision_wasm_internal.js:9 E0219 11:33:37.262000 1883440 image_to_tensor_utils.cc:56] INTERNAL: RET_CHECK failure (third_party/mediapipe/calculators/tensor/image_to_tensor_utils.cc:56) roi->width > 0 && roi->height > 0 ROI width and height must be > 0.
Stack trace:
_emscripten_errn @ vision_wasm_internal.js:9
$func12953 @ vision_wasm_internal.wasm:0x6bf6ab
$func5855 @ vision_wasm_internal.wasm:0x382bff
$func3466 @ vision_wasm_internal.wasm:0x1c25d7
$func283 @ vision_wasm_internal.wasm:0xfb61
$func1552 @ vision_wasm_internal.wasm:0xa449e
$func234 @ vision_wasm_internal.wasm:0xc37f
$func878 @ vision_wasm_internal.wasm:0x50b4d
$func12433 @ vision_wasm_internal.wasm:0x65f71c
$func12043 @ vision_wasm_internal.wasm:0x613f94
$func10038 @ vision_wasm_internal.wasm:0x5664b9
$func3217 @ vision_wasm_internal.wasm:0x1a0233
$func3309 @ vision_wasm_internal.wasm:0x1b322a
$ze @ vision_wasm_internal.wasm:0x3feb38
Module._waitUntilIdle @ vision_wasm_internal.js:9
finishProcessing @ index-3b985e55.js:88
finishProcessing @ index-3b985e55.js:88
process @ index-3b985e55.js:88
processVideoData @ index-3b985e55.js:88
detectForVideo @ index-3b985e55.js:88
_t @ index-3b985e55.js:89
yt @ index-3b985e55.js:89
Nb @ index-3b985e55.js:37
Tb @ index-3b985e55.js:37
Ub @ index-3b985e55.js:37
nf @ index-3b985e55.js:37
se @ index-3b985e55.js:37
(익명) @ index-3b985e55.js:37
Rk @ index-3b985e55.js:40
Jb @ index-3b985e55.js:37
hd @ index-3b985e55.js:37
fd @ index-3b985e55.js:37
ed @ index-3b985e55.js:37
vision_wasm_internal.js:9 E0219 11:33:37.264000 1883440 calculator_graph.cc:881] INTERNAL: CalculatorGraph::Run() failed: 
Calculator::Process() for node "mediapipe_tasks_vision_face_landmarker_facelandmarkergraph__mediapipe_tasks_vision_face_detector_facedetectorgraph__mediapipe_tasks_components_processors_imagepreprocessinggraph__ImageToTensorCalculator" failed: RET_CHECK failure (third_party/mediapipe/calculators/tensor/image_to_tensor_utils.cc:56) roi->width > 0 && roi->height > 0 ROI width and height must be > 0.
=== Source Location Trace: ===
third_party/mediapipe/calculators/tensor/image_to_tensor_utils.cc:56
third_party/mediapipe/calculators/tensor/image_to_tensor_calculator.cc:224
third_party/mediapipe/framework/calculator_node.cc:950
_emscripten_errn @ vision_wasm_internal.js:9
$func12953 @ vision_wasm_internal.wasm:0x6bf6ab
$func5855 @ vision_wasm_internal.wasm:0x382bff
$func3466 @ vision_wasm_internal.wasm:0x1c25d7
$func283 @ vision_wasm_internal.wasm:0xfb61
$func3309 @ vision_wasm_internal.wasm:0x1b33aa
$ze @ vision_wasm_internal.wasm:0x3feb38
Module._waitUntilIdle @ vision_wasm_internal.js:9
finishProcessing @ index-3b985e55.js:88
finishProcessing @ index-3b985e55.js:88
process @ index-3b985e55.js:88
processVideoData @ index-3b985e55.js:88
detectForVideo @ index-3b985e55.js:88
_t @ index-3b985e55.js:89
yt @ index-3b985e55.js:89
Nb @ index-3b985e55.js:37
Tb @ index-3b985e55.js:37
Ub @ index-3b985e55.js:37
nf @ index-3b985e55.js:37
se @ index-3b985e55.js:37
(익명) @ index-3b985e55.js:37
Rk @ index-3b985e55.js:40
Jb @ index-3b985e55.js:37
hd @ index-3b985e55.js:37
fd @ index-3b985e55.js:37
ed @ index-3b985e55.js:37
index-3b985e55.js:88 Uncaught (in promise) Error: INTERNAL: CalculatorGraph::Run() failed: 
Calculator::Process() for node "mediapipe_tasks_vision_face_landmarker_facelandmarkergraph__mediapipe_tasks_vision_face_detector_facedetectorgraph__mediapipe_tasks_components_processors_imagepreprocessinggraph__ImageToTensorCalculator" failed: RET_CHECK failure (third_party/mediapipe/calculators/tensor/image_to_tensor_utils.cc:56) roi->width > 0 && roi->height > 0 ROI width and height must be > 0.; WaitUntilIdle failed
=== Source Location Trace: ===
third_party/mediapipe/calculators/tensor/image_to_tensor_utils.cc:56
third_party/mediapipe/calculators/tensor/image_to_tensor_calculator.cc:224
third_party/mediapipe/framework/calculator_node.cc:950
research/drishti/app/pursuit/wasm/graph_utils.cc:187

    at pt.handleErrors (index-3b985e55.js:88:325571)
    at pt.finishProcessing (index-3b985e55.js:88:325330)
    at pt.process (index-3b985e55.js:88:331231)
    at pt.processVideoData (index-3b985e55.js:88:329821)
    at pt.detectForVideo (index-3b985e55.js:88:466753)
    at _t (index-3b985e55.js:89:2805)
    at yt (index-3b985e55.js:89:1947)
    at Object.Nb (index-3b985e55.js:37:9858)
    at Tb (index-3b985e55.js:37:10014)
    at Ub (index-3b985e55.js:37:10071)
handleErrors @ index-3b985e55.js:88
finishProcessing @ index-3b985e55.js:88
process @ index-3b985e55.js:88
processVideoData @ index-3b985e55.js:88
detectForVideo @ index-3b985e55.js:88
_t @ index-3b985e55.js:89
yt @ index-3b985e55.js:89
Nb @ index-3b985e55.js:37
Tb @ index-3b985e55.js:37
Ub @ index-3b985e55.js:37
nf @ index-3b985e55.js:37
se @ index-3b985e55.js:37
(익명) @ index-3b985e55.js:37
Rk @ index-3b985e55.js:40
Jb @ index-3b985e55.js:37
hd @ index-3b985e55.js:37
fd @ index-3b985e55.js:37
ed @ index-3b985e55.js:37
image
kolorfilm commented 6 months ago

I have the same problem since a day. It started with version 0.10.10. "@mediapipe/tasks-vision": "^0.10.9 works without any problems.

With version 0.10.0 the wasm files were updated. See the changelog.

-Update WASM files for 0.10.10 release

It has probably something to do with that.

kuaashish commented 6 months ago

Hi @schmidt-sebastian,

Could you please have look into this issue?

Thank you!!

kuaashish commented 5 months ago

Hi @sewonjun,

We have released an updated version, 0.10.12 https://www.npmjs.com/package/@mediapipe/tasks-vision/v/0.10.12 Could you please retry and inform us if the issue persists?

Thank you!!

kolorfilm commented 5 months ago

Hi @sewonjun,

We have released an updated version, 0.10.12 https://www.npmjs.com/package/@mediapipe/tasks-vision/v/0.10.12 Could you please retry and inform us if the issue persists?

Thank you!!

I tried it with the 0.10.12 version locally. Unfortunetely I still have the problem.

Bildschirmfoto 2024-04-16 um 12 52 54
kolorfilm commented 4 months ago

Any update here? 👀

sewonjun commented 4 months ago

version 0.10.12 works for me!! Thanks!
Mac version 14.4 Chrome version 122.0.6261.94

leskodan commented 4 months ago

Working for me with 0.10.12 as well, though I was running into the same issue. In my case, it was because I was calling predictWebcam before the HTMLVideoElement was fully loaded. I see you're listening for the loadeddata event... is there a loadedmetadata event that you can listen for to ensure the height and width of the video element are not zero? I ended up fixing my issue by setting the video element in a separate step than calling predictWebcam, but was going to look into waiting on the metadata next if that didn't work. Not 100% sure this is the ticket, but hopefully it helps!

kolorfilm commented 4 months ago

predictWebcam

I do not use predictWebcam in my code, but simple HTML5 video object that is passed to mediaPipe. It worked before with the old version and I'm still wondering why this is not working anymore for 0.12.12.

Update: I tried it again with the latest version and strangely it worked now. I made a proper installation (removed node_modules and so on). I made a few tries and all worked. Thanks! 🚀

google-ml-butler[bot] commented 3 months ago

Are you satisfied with the resolution of your issue? Yes No