dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
Apache License 2.0
1.39k stars 66 forks source link

How to prepare training data for ControlNeXt-SVD-v2? #29

Open JWargrave opened 2 months ago

JWargrave commented 2 months ago

Hi, thank you for your great job!

I want to finetune ControlNeXt-SVD-v2 on my own dataset. And I have some problems with data preprocessing.


First is guide_path in meta_info.json. According to the preprocess.py, I think pose_video.mp4 obtained by the code below is the corresponding guide_path for a given train_video.mp4.

from dwpose.dwpose_detector import dwpose_detector as dwprocessor
from dwpose.util import draw_pose
import decord
from tqdm import tqdm
import numpy as np
import cv2

def write_mp4(list_of_rgb_np_img,fps,output_filename):
    height, width, _ = list_of_rgb_np_img[0].shape
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    video_writer = cv2.VideoWriter(output_filename, fourcc, fps, (width, height))
    for frame in list_of_rgb_np_img:
        video_writer.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
    video_writer.release()

video_path='train_video.mp4'
vr=decord.VideoReader(video_path,ctx=decord.cpu(0))

frames=vr.get_batch(list(range(0, len(vr)))).asnumpy()

height,width=frames.shape[1],frames.shape[2]

detected_poses = [np.array(draw_pose(dwprocessor(frm),height,width)).transpose((1,2,0)) for frm in tqdm(frames, desc="DWPose")]
dwprocessor.release_memory()
write_mp4(detected_poses,vr.get_avg_fps(),'./pose_video.mp4')

For example:

https://github.com/user-attachments/assets/dc3a5892-7efa-416a-93a7-bb7530f5b1c3

https://github.com/user-attachments/assets/eca9bb56-9a78-4fe5-b3fb-9cf02b1b8c7e

Is it right?


Second is meta_info in meta_info.json (i.e., meta_info_example/meta_info/1.json), which contains information about boxes, hands_boxes and hands_score of every frame. Could you tell me how to calculate these three variables?


Thanks a lot.

JWargrave commented 2 months ago

I also want to know what the ref_w of draw_pose was when you trained ControlNeXt-SVD-v2? Is it the default 2160?