Open p0mad opened 1 year ago
Hi pomad, thanks for your attention! In my machine (11GB 2080TI), it is feasible to produce a consistent video conditioned on human pose with about 100 frames (i.e., 4~5 seconds in 24 fps), which is shown in https://github.com/YBYBZhang/ControlVideo#long-video-generation.
@YBYBZhang Thats great. But have you initialized the pose with some inputs (video or an image)?
I have a video of OpenPose+hands+face and i want to generate human like animation (No matter what, but just a consistent Character/Avatar) Sample Video
human pose with about 100 frames (i.e., 4~5 seconds in 24 fps), which is shown in #long-video-generation.
The hulk sized grows and the face/hairs changes during the generated video! Do you have any idea on how to have a fixed sized and consistent character?
Thanks Best regards
@p0mad The synthesized Hulk video is initialized with poses below. Now, our ControlVideo ensures video consistency with fully cross-frame attention only. In future, adding temporal attention by finetuning on sufficient videos may improve size and character consistency! https://github.com/YBYBZhang/ControlVideo/assets/40799060/21b53efe-2167-4f74-afc2-3bec021acf20
@YBYBZhang Thanks for the detailed information. Would you please also give me some insights / guides into the Hands+Face of the Pose? Is there any model that i can use? ( i see that ControlNet has the Full-OpenPose) but as i tested in the HF space, it wont care about it! Is there any reason? (bad output)
Also would you please provide me some prompts that output a consistent character for the provided pose ( like a boy playing something with black background and animation style) to get a consistent character able to dance with the correct generation of face and hands?
This was my best bet on generation with the pose!
Thanks Best regards
Full-Openpose ControlNet is trained based on Stable Diffusion v1.5, and thus inherits its limitations in producing low-quality hands and faces. I have tried to produce a video using ControlVideo (ControlNet v1.1, full-openpose), with a simple prompt "A man, animation style." As shown below, the sythensized video looks more consistent than that from vanilla ControlNet. I hope this would help you.
https://github.com/YBYBZhang/ControlVideo/assets/40799060/31fc2127-b296-4727-b161-700aade31d0b
@YBYBZhang Thank you so much for your time. Would you please guide me into the steps of generating this video?
You have installed controlnet 1.1, download the OpenPose-full weights and then selected the openpose-full, put the "A man, animation style." in prompt box, input the video pose ( or have you used batch?) and then generate without any other input? How about the seed, steps?
is there any other ways to improve hand and face accuracy? like using openpifpaf as mentioned in the ControlNet paper ( which is on SD2.1)? or SD2.1/SD-XL for OpenPose-full version?
Also would you please let me know your GPU and Mem, CPU?
Thanks Best regards
With 2080Ti 11GB GPU, I use the following script to produce above video:
python inference.py \
--prompt "A man, animation style." \
--condition "openpose" \
--video_path "data/pose1.mp4" \
--output_path "outputs/" \
--video_length 55 \
--smoother_steps 19 20 \
--width 512 \
--height 512 \
--frame_rate 2 \
--version v11 \
--is_long_video
where pose1.mp4
is center-cropped from your pose video.
I haven't explored using higher SD or ControlNet to enhance hand and face, but I believe that they could achieve this goal.
@YBYBZhang, Thank you so much
I have Five questions in regards:
--condition "openpose"
.
--video_length 55 \
The original Pose: Sample Video shows the length of 1.8 s
while the output video: yours(output)
The Results is confusing to me(Shows 2.7s)!
Have you changed the input (Pose) to the 30Fps and center cropped? can you please send the cropped version of my pose?
Do we able to generate output with 24 or 30FPS instead of 20? (is that --smoother_steps option?) .
but I believe that they could achieve this goal
.
Would it be possible to Also input a random image ( desired character) as an intial Character to the SD+CN? Example image:
Is it possible to set number of steps for each frame in Unet?
Thanks Again Best regard
Hi, Is it possible to generate a single character from the Pose for more than 5 seconds?
I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controled (pose) input?
Thanks Best regards