google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.8k stars 5.09k forks source link

What is difference between @mediapipe/tasks-vision and @mediapipe/selfie_segmentation on npm? #4251

Closed hongruzhu closed 1 year ago

hongruzhu commented 1 year ago

I am working on a video conference project using JavaScript, and I want to incorporate real-time selfie segmentation on the webcam using MediaPipe npm packages. However, I noticed that there are two different packages available on npm. The MediaPipe website uses the @mediapipe/tasks-vision npm package to demonstrate image segmentation, and provides demo code (available at https://codepen.io/mediapipe-preview/pen/xxJNjbN). However, it seems that the webcam demo is a little laggy and not very smooth. On the other hand, I found another demo (https://github.com/ayushgdev/MediaPipeCodeSamples/blob/main/Vanilla%20JS/selfie%20segmentation%20with%20bg%20blur.html) that uses the @mediapipe/selfie_segmentation npm package to demonstrate background blurring except for the selfie segmentation. This demo is much smoother than the former.

My question is: what is the difference between @mediapipe/tasks-vision and @mediapipe/selfie_segmentation on npm? Is using @mediapipe/selfie_segmentation a better and smoother option compared to using @mediapipe/tasks-vision?

Thank you very much!

Neilblaze commented 1 year ago

@hongruzhu good question! Google Meet uses a proprietary model to implement real-time selfie segmentation on the input video feed, which unfortunately is not open-sourced. You can refer to this issue to know more.

Coming back to your question, both @mediapipe/tasks-vision & @mediapipe/selfie_segmentation are capable of performing realtime segmentation on the input feed. In terms of performance and accuracy, both solutions are based on state-of-the-art machine learning models (assuming you're using up-to-date versions) and can provide high-quality results. The choice between them will depend on the specific requirements of the project & system configuration of your device (since both runs on-device).

But in general, @mediapipe/tasks-vision is a more general-purpose solution for performing various computer vision tasks, including object detection, face detection, and segmentation. It allows for greater customization and flexibility, but may require more effort to implement for specific use cases.

ayushgdev commented 1 year ago

Hello @hongruzhu We are building a set of new, improved MediaPipe Solutions to help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions. And we request the MediaPipe developer community help us uncover the issues and make the APIs more resilient. As part of the new APIs, we extend the computer vision related solutions as @mediapipe/tasks-vision in javascript world. These tasks include image segmentation(this includes the selfie segmentation), object detection, hand gesture recognition, etc.

The @mediapipe/selfie_segmentation package is the legacy solution offering which provided capability of selfie segmentation. However, we are ending support for these MediaPipe Legacy Solutions, and upgrading the others. The libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions.

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 1 year ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

bhargey-s commented 1 year ago

@hongruzhu So will talk first about the codepen, the model they have implemented there is the "deeplab_v3" model but in the documentation they have only mentioned the Selfie Segmentation model don't know why it is like that.

But I replaced the url for deeplab_v3 with the selfie_segmenter model "https://storage.googleapis.com/mediapipe-models/image_segmenter/selfie_segmenter/float16/latest/selfie_segmenter.tflite?v=aljali.mediapipestudio_20230621_1811_RC00" and I got very good results. The lag which you described about is not there with this model.

Now talking about the difference between @mediapipe/tasks-vision and @mediapipe/selfie_segmentation, so after updating the model url as described above there is not much difference in both the models just I observed that the edges were sharp in the taskvision model then the other model/

aljoscha-s commented 10 months ago

It seems, that @mediapipe/selfie_segmentation cannot be started from a WebWorker. At least, I had this issue in my use case. It's also mentioned here https://github.com/google/mediapipe/issues/3659#issuecomment-1289300054

In contrast, @mediapipe/tasks_vision should work fine for this purpose as mentioned here https://github.com/google/mediapipe/issues/3659#issuecomment-1705136767

Haven't checked this yet myself. But it's confirmed here https://github.com/google/mediapipe/issues/3659#issuecomment-1718467419