I want to finetune ControlNeXt-SVD-v2 on my own dataset. And I have some problems with data preprocessing.
First is guide_path in meta_info.json. According to the preprocess.py, I think pose_video.mp4 obtained by the code below is the corresponding guide_path for a given train_video.mp4.
from dwpose.dwpose_detector import dwpose_detector as dwprocessor
from dwpose.util import draw_pose
import decord
from tqdm import tqdm
import numpy as np
import cv2
def write_mp4(list_of_rgb_np_img,fps,output_filename):
height, width, _ = list_of_rgb_np_img[0].shape
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
video_writer = cv2.VideoWriter(output_filename, fourcc, fps, (width, height))
for frame in list_of_rgb_np_img:
video_writer.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
video_writer.release()
video_path='train_video.mp4'
vr=decord.VideoReader(video_path,ctx=decord.cpu(0))
frames=vr.get_batch(list(range(0, len(vr)))).asnumpy()
height,width=frames.shape[1],frames.shape[2]
detected_poses = [np.array(draw_pose(dwprocessor(frm),height,width)).transpose((1,2,0)) for frm in tqdm(frames, desc="DWPose")]
dwprocessor.release_memory()
write_mp4(detected_poses,vr.get_avg_fps(),'./pose_video.mp4')
Second is meta_info in meta_info.json (i.e., meta_info_example/meta_info/1.json), which contains information about boxes, hands_boxes and hands_score of every frame. Could you tell me how to calculate these three variables?
Hi, thank you for your great job!
I want to finetune ControlNeXt-SVD-v2 on my own dataset. And I have some problems with data preprocessing.
First is
guide_path
in meta_info.json. According to the preprocess.py, I thinkpose_video.mp4
obtained by the code below is the correspondingguide_path
for a giventrain_video.mp4
.For example:
https://github.com/user-attachments/assets/dc3a5892-7efa-416a-93a7-bb7530f5b1c3
https://github.com/user-attachments/assets/eca9bb56-9a78-4fe5-b3fb-9cf02b1b8c7e
Is it right?
Second is
meta_info
in meta_info.json (i.e., meta_info_example/meta_info/1.json), which contains information aboutboxes
,hands_boxes
andhands_score
of every frame. Could you tell me how to calculate these three variables?Thanks a lot.