Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
17.04k stars 1.96k forks source link

[Feature]: Integration with human software #1209

Closed ArtaFakhari closed 1 year ago

ArtaFakhari commented 1 year ago

'human' is written by @vladmandic It's an amazing tool for:

AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation

Combining its cool features with the power of ControlNet would be a game-changer.

I asked Vladimir about it here, and he says he's totally ready to do this in cooperation with your team.

huchenlei commented 1 year ago

The linked issue mentions that you want to use human as controlnet preprocessor, but there is no existing controlnet model that can accept the pose/face format generated by human. Maybe you are proposing a new controlnet model on human library's output?

lllyasviel commented 1 year ago

we already imported google mediapipe https://developers.google.com/mediapipe/solutions

vladmandic commented 1 year ago

mediapipe as default is decent. human supports multiple backend models (including mediapipe) so you can either even better precision (e.g. movenet is typically better than mediapipe) or things like multi-person pose detection. but would i consider that a high priority item for controlnet? not really. plus human is primarily used client-side while controlnet runs server-side (human can run server-side, but for pure server-side there are better solutions).

for real sota, i'd suggest something like metrabs, i've used it in my previous projects and its amazing - take a look at https://github.com/vladmandic/body-pose

https://github.com/vladmandic/body-pose/blob/main/assets/screenshot-dance.jpg

lllyasviel commented 1 year ago

and even importing the realtively "easy-to-install" mediapipe has already given us at least 4 issues and may "fail-to-install" posts. Even today, many mac users still get error about mediapipe and we have to change the pip install into "try pip install". Our future integration about external packages will be extremely careful. Note that CN has made significant efforts to reduce dependency of each preprocessors and avoid mmcv/detectron/etc, and we do not even need scipy and scikit-image. Although I really want to get scikit-image, but I am not doing that.

vladmandic commented 1 year ago

@lllyasviel mediapipe is quite proprietary and its binary engine is not fully cross-platform. if you went with any pure torch model, you'd have probably avoided those issues - but water under the bridge.

ArtaFakhari commented 1 year ago

The linked issue mentions that you want to use human as controlnet preprocessor, but there is no existing controlnet model that can accept the pose/face format generated by human. Maybe you are proposing a new controlnet model on human library's output?

Yes and No! Actually it also could be a good idea to do so

ArtaFakhari commented 1 year ago

mediapipe as default is decent. human supports multiple backend models (including mediapipe) so you can either even better precision (e.g. movenet is typically better than mediapipe) or things like multi-person pose detection. but would i consider that a high priority item for controlnet? not really. plus human is primarily used client-side while controlnet runs server-side (human can run server-side, but for pure server-side there are better solutions).

for real sota, i'd suggest something like metrabs, i've used it in my previous projects and its amazing - take a look at https://github.com/vladmandic/body-pose

https://github.com/vladmandic/body-pose/blob/main/assets/screenshot-dance.jpg

I believe "Human" could be used as middleware inside CN. What if 'Human' could also support Luxonis/DepthAI to get depth camera images to produce more accurate human pose or object movement prediction? It's also useful to produce real-time SD scenes. @lllyasviel @vladmandic @luxonis