Input rotated video to Face Landmarks model

Daniel-Nicolae commented 7 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Windows 10 web

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Face Landmarks Detection

Programming Language and version (e.g. C++, Python, Java)

Typescript

Describe the actual behavior

I'm rotating the HTMLVideoElement using video.style.setProperty("transform", "rotate(90deg)") and the video does appear rotated on the webpage but when I pass it to the estimateFaces() function, the function returns the same landmarks as when the video was not rotated.

Describe the expected behaviour

I expected the x and y coordinates of the predicted landmark positions to rotate when I apply the rotation to the video.

Standalone code/steps you may have used to try to get what you need

const handleVideoLoad = async (videoNode: SyntheticEvent) => {
    const video = videoNode.target as HTMLVideoElement
    video.style.setProperty("transform", "rotate(90deg)")
    if (video.readyState !== 4) return
    const canvas = document.getElementById("faceMeshCanvas") as HTMLCanvasElement
    const mirrored = true
    loopRef.current = await runDetector(video, canvas, mirrored, landmarksRef, meshActiveCallback) 
}

<Webcam 
    videoConstraints={{
        width: window.innerWidth*0.25,
        aspectRatio: 4/3,
        deviceId: cameraIds[currentCamera]}}
    onLoadedData={handleVideoLoad}
    mirrored={true}
/>

The model function:

const runDetector = async (video: HTMLVideoElement, canvas: HTMLCanvasElement, mirrored: boolean,
                                  landmarksRef: React.MutableRefObject<faceLandmarksDetection.Keypoint[]>, 
                                  meshActiveCallback: () => boolean) => {
    const model = faceLandmarksDetection.SupportedModels.MediaPipeFaceMesh
    const detectorConfig: MediaPipeFaceMeshMediaPipeModelConfig = {
        runtime: "mediapipe",
        solutionPath: 'https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh',
        refineLandmarks: false
    }
    const detector = await faceLandmarksDetection.createDetector(model, detectorConfig)

    const detect = async (detector: FaceLandmarksDetector) => {
        if (video) {
            const faces = await detector.estimateFaces(video, {flipHorizontal: mirrored, staticImageMode: false})
            if (faces.length !== 0) {
                landmarksRef.current = faces[0].keypoints
                const ctx = canvas.getContext('2d')
                ctx!.clearRect(0, 0, canvas.width, canvas.height)
                if (meshActiveCallback()) drawFaceMesh(canvas, faces[0].keypoints)    
        }}
    }
    const modelLoop = setInterval(detect, 20, detector)
    return modelLoop
}

And drawFaceMesh only draws the landmarks wireframe on a canvas that is overlaid on the webcam preview.

Other info / Complete Logs

No response

ayushgdev commented 7 months ago

Hello @Daniel-Nicolae There are a couple of clarifications, if you may, which would help us serve your request better.

The code looks like it is a React app. Is the assumption correct?
From the CDN used, it looks like you are using legacy version of MediaPipe and not the latest version of FaceLandmarker.
None of our APIs contain a method named detector.estimateFaces(). Would you please point to the documentation you are referring to?

If possible, can you please provide a minimal yet complete code in a GitHub repo so that we can take a look at the complete picture of the dataflow, constructs used, the method/imports, etc.?

Daniel-Nicolae commented 7 months ago

Hi Ayush,

Sorry for not making this more clear. The app is a React web app, yes.

I'm not sure if the version of MediaPipe is legacy or not, I've just followed the instructions from here: https://github.com/tensorflow/tfjs-models/tree/master/face-landmarks-detection/src/mediapipe

If it helps, in my package.json I have:

"@mediapipe/face_mesh": "^0.4.1633559619", "@tensorflow-models/face-landmarks-detection": "^1.0.5", "@tensorflow/tfjs": "^4.17.0", "@tensorflow/tfjs-converter": "^4.17.0", "@tensorflow/tfjs-core": "^4.17.0"

Also, the code does have the detector.estimateFaces(), it's inside the async function detect().

I recreated the error in this simple repo here: https://github.com/Daniel-Nicolae/FaceLandmarker-video-issue It inputs the webcam video to the model and, in a setInterval, draws the wireframe on top. You will see that if the video gets rotated, the x and y predictions remain unchanged and the wireframe drawing also remains unchanged.

I would like the model to use the rotated video for inference.

ButzYung commented 7 months ago

The CSS transform property doesn't actually change the pixel data of the video element. If you need to actually transform the pixels, copy the video frame to a canvas first and then perform the transformation there.

ayushgdev commented 7 months ago

Hello @Daniel-Nicolae Thanks for the details. We can confirm that the APIs you are referring to are legacy API and have been outdated as such. TF-JS is using old APIs in its demo, something we have highlighted to the TF-JS team as well. Meanwhile, we would highly recommend shifting to newer Task API since the support for legacy API has been stopped. Also, the newer APIs are much easier to use and resilient. You can get started quickly following the documentation for FaceLandmarker here

Daniel-Nicolae commented 7 months ago

Thanks! I got started with it, and indeed the new API is simpler and it also has an input preprocessing parameter which applies rotations, that's exactly what I needed.

As an aside, I see this new API doesn't have the flipHorizontal option, which the old one had. That would be useful to add back.

google-ml-butler[bot] commented 7 months ago

Are you satisfied with the resolution of your issue? Yes No

google-ai-edge / mediapipe