Closed ArtaFakhari closed 1 year ago
The linked issue mentions that you want to use human
as controlnet preprocessor, but there is no existing controlnet model that can accept the pose/face format generated by human
. Maybe you are proposing a new controlnet model on human
library's output?
we already imported google mediapipe https://developers.google.com/mediapipe/solutions
mediapipe as default is decent. human
supports multiple backend models (including mediapipe) so you can either even better precision (e.g. movenet is typically better than mediapipe) or things like multi-person pose detection.
but would i consider that a high priority item for controlnet? not really. plus human
is primarily used client-side while controlnet runs server-side (human can run server-side, but for pure server-side there are better solutions).
for real sota, i'd suggest something like metrabs, i've used it in my previous projects and its amazing - take a look at https://github.com/vladmandic/body-pose
https://github.com/vladmandic/body-pose/blob/main/assets/screenshot-dance.jpg
and even importing the realtively "easy-to-install" mediapipe has already given us at least 4 issues and may "fail-to-install" posts. Even today, many mac users still get error about mediapipe and we have to change the pip install into "try pip install". Our future integration about external packages will be extremely careful. Note that CN has made significant efforts to reduce dependency of each preprocessors and avoid mmcv/detectron/etc, and we do not even need scipy and scikit-image. Although I really want to get scikit-image, but I am not doing that.
@lllyasviel mediapipe is quite proprietary and its binary engine is not fully cross-platform. if you went with any pure torch model, you'd have probably avoided those issues - but water under the bridge.
The linked issue mentions that you want to use
human
as controlnet preprocessor, but there is no existing controlnet model that can accept the pose/face format generated byhuman
. Maybe you are proposing a new controlnet model onhuman
library's output?
Yes and No! Actually it also could be a good idea to do so
mediapipe as default is decent.
human
supports multiple backend models (including mediapipe) so you can either even better precision (e.g. movenet is typically better than mediapipe) or things like multi-person pose detection. but would i consider that a high priority item for controlnet? not really. plushuman
is primarily used client-side while controlnet runs server-side (human can run server-side, but for pure server-side there are better solutions).for real sota, i'd suggest something like metrabs, i've used it in my previous projects and its amazing - take a look at https://github.com/vladmandic/body-pose
https://github.com/vladmandic/body-pose/blob/main/assets/screenshot-dance.jpg
I believe "Human" could be used as middleware inside CN. What if 'Human' could also support Luxonis/DepthAI to get depth camera images to produce more accurate human pose or object movement prediction? It's also useful to produce real-time SD scenes. @lllyasviel @vladmandic @luxonis
'human' is written by @vladmandic It's an amazing tool for:
AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracking, Gesture Recognition, Body Segmentation
Combining its cool features with the power of ControlNet would be a game-changer.
I asked Vladimir about it here, and he says he's totally ready to do this in cooperation with your team.