WebAR-rocks / WebAR.rocks.face

WebAR.rocks face detection and tracking JavaScript library
https://webar.rocks
Other
84 stars 27 forks source link

Using a single output canvas or mediastream #8

Closed jtestard closed 3 years ago

jtestard commented 3 years ago

At the moment, the camera output and filter output are displayed on different canvases. Is there a way to ue a single output canvas or mediastream? To send to other participants in video conferencing, for example

xavierjs commented 3 years ago

The 1 canvas approach is worse than the 2 canvas approach. WebGL context has a very persistent state and multiple execution paths (WebGL1, WebGL2, extensions or not). Using a WebGL context shared with THREE.js will cause device-specific, hard-to-solve bugs, linked to context states.

Because THREE will think that the context is in some state, WebAR.rocks lib will change the context, and then THREE will think the context is still unchanged but it is. Context reset in THREE is not advised since you will lose performance, and this feature is often broken by THREE updates.

Moreover, there are some weird bugs with antialiasing and RTT to float texture on WebGL2/IOS. So it is not possible to run WebAR.rocks libraries with antialiasing on WebGL2/IOS devices. If you use 1 context for both, then you won't have antialiasing on THREE.js.

I think the best way to implement what you want is to use a third canvas for the compositing. it adds the possibility to lower the compositing canvas resolution to optimize the mediastream.

This issue is related to: https://github.com/jeeliz/jeelizFaceFilter#1-or-2-canvas Many Jeeliz FaceFilter demos are still running on 1 canvas, but they are using deprecated versions of THREE.js. With the new THREE..js version, the code will be broken.

jtestard commented 3 years ago

Ok, to clarify what we're trying to do:

We want to send a webcam video with mask applied onto a media stream to send to other peers in the context of a video conferencing application, where each peer could apply a Jeeliz mask. Each peer only needs to render a single mask, since copies are streamed to the other users.

The third canvas is the approach you recommend, my follow up questions/

xavierjs commented 3 years ago

To do the compositing operation, I think that using canvas2D and ctx.drawImage twice with the 2 canvases (first video, then THREE) should make the job. canvas2d is hardware accelerated, I am not sure that it will be really faster with WebGL. If, after profiling, too much time is spend on ctx.drawImage, it would worth it to do the operation with WebGL.

Indeed, to extract the video stream from the compositing canvas, you need CaptureStream API. But it seems to be supported on Safari: https://caniuse.com/mdn-api_htmlcanvaselement_capturestream Only IE does not support that. In this case, the audio stream should still be sent separately.

One alternative for your case would be to stream the face poses using websockets. So each player would do the face detection for 1 video, and the rendering for all videos. But I think this solution is worse than sending the video with the mask on.

jtestard commented 3 years ago

Sending the video with the mask off means only computing the mask once, and ensures the face detection is applied as early as possible.