facebookresearch / VideoPose3D

Efficient 3D human pose estimation in video using 2D keypoint trajectories
Other
3.75k stars 758 forks source link

is it possible to run the inference for videopose3d with detectron 2? #107

Open na018 opened 4 years ago

zhipeng-fan commented 4 years ago

Also interested in that

darkAlert commented 4 years ago

Here is a script that prepares data for VideoPose3D (in .npz format) using Detectron2: https://github.com/darkAlert/VideoPose3d_with_Detectron2/blob/master/detectron_pose_predictor.py

After executing this script, you can go to step 5: https://github.com/facebookresearch/VideoPose3D/blob/master/INFERENCE.md#step-5-rendering-a-custom-video-and-exporting-coordinates

wendell-hom commented 4 years ago

Will detectron_pose_predictor.py work for video?

darkAlert commented 4 years ago

@wendell-hom Now yes! I have just add the function read_video() to the code. So, predict_pose() can works with both images and video: https://github.com/darkAlert/VideoPose3d_with_Detectron2/blob/master/detectron_pose_predictor.py#L160

wendell-hom commented 4 years ago

thanks! I will try this out

immkapoor commented 4 years ago

@darkAlert Hey, I might be wrong here. But probably this is overwriting the keypoint values? I compared the generated file with that of pre-trained models, and the number of entries is very less. Either the length of video v.s the ones gave during training highly mismatches, or the file is getting overwritten. If the generated keypoints are less rendering will become tough.

darkAlert commented 4 years ago

@immkapoor You can verify the number of entries in the generated file by multiplying the video duration (in seconds) by FPS (default is 30). The number of entries has to equal this value, because even if the Detectron2 missed a person in the frame, the body joints ​​for this frame will be interpolated.

immkapoor commented 4 years ago

@darkAlert If I am getting it correctly, then the no of entries should be 1200. My video is 40sec long and I haven't changed any value. The video comprises a single person only. I have another doubt, sorry for bothering you. Shouldn't it have keypoints (17) detected for all the corresponding frames? Like 17x40=680 entries?

darkAlert commented 4 years ago

@immkapoor Not exactly. After detectron_pose_predictor.py is executed, you have to get a .npz file containing a numpy array with the shape (n,17,2), where n = video_len fps (4030=1200 entries in your case).

I've just added https://github.com/darkAlert/VideoPose3d_with_Detectron2/blob/master/visualization.py to visualize the results of Detectron2 predictions. So you can check if frames were missed.

In addition, you can also load your .npz file and examine its shape and content using https://github.com/darkAlert/VideoPose3d_with_Detectron2/blob/master/visualization.py#L8 function.

DA-fromindia commented 4 years ago

Here is a script that prepares data for VideoPose3D (in .npz format) using Detectron2: https://github.com/darkAlert/VideoPose3d_with_Detectron2/blob/master/detectron_pose_predictor.py

After executing this script, you can go to step 5: https://github.com/facebookresearch/VideoPose3D/blob/master/INFERENCE.md#step-5-rendering-a-custom-video-and-exporting-coordinates

what should i pass in --render --viz-subject input_video.mp4 i tried multiple things , it gives error everytime

darkAlert commented 4 years ago

@DA-fromindia what error did you get?

nathan60107 commented 4 years ago

Here is a script that prepares data for VideoPose3D (in .npz format) using Detectron2: https://github.com/darkAlert/VideoPose3d_with_Detectron2/blob/master/detectron_pose_predictor.py After executing this script, you can go to step 5: https://github.com/facebookresearch/VideoPose3D/blob/master/INFERENCE.md#step-5-rendering-a-custom-video-and-exporting-coordinates

what should i pass in --render --viz-subject input_video.mp4 i tried multiple things , it gives error everytime

You should pass the name of the video. If the video is human.mp4, then just pass human.mp4 at that parameter. Because if your video is human.mp4, infer_video_d2.py will output human.mp4.npz as a 2D keypoint. And prepare_data_2d_custom.py will treat video file name as the subject name. So there will be a subject "human.mp4" in data_2d_custom_myvideo.npz.

yifeizhangfr commented 3 years ago

I had a pre_trained, Where should I put the trained model so that it won't download automatically, it's so slowly!!! thanks!

darkAlert commented 3 years ago

I had a pre_trained, Where should I put the trained model so that it won't download automatically, it's so slowly!!! thanks!

def init_predictor(config_path, weights_path):               // weights_path - the path to your model
    cfg = get_cfg()
    cfg.merge_from_file(config_path)
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
    cfg.MODEL.WEIGHTS = weights_path
    predictor = DefaultPredictor(cfg)

    return predictor