bad performance on the same wild video

bucktoothsir commented 5 years ago

hello

I downloaded the same skating video with 1920*1080 resolution from youtube.
I predicted 2d coco joints for this video by the model you provided in https://github.com/facebookresearch/VideoPose3D/issues/2
I made a dataset file and replaced the res_w and res_h in h36m_dataset.py
Then I get a result by d-pt-243.bin as follows.

Obviously it is wrose than your result _69d6714a-4b64-42c0-ab0e-92008275b34e

I noticed that your video is with a high resolution and much more accurate 2d joints. Could you please release the original skate video and test in the wild code?

Godatplay commented 5 years ago

In terms of the output resolution, you set that with --viz-size. I chose 10 and it seems close, the default is 5.

I'm not sure how much difference it'll make, but consider also changing center as well since all 3 are used to renormalize the camera.

How did you build your dataset file?

dariopavllo commented 5 years ago

Did you follow the instructions mentioned in my last post here?

Also, in this comment I mentioned that we used CPN to extract the 2D keypoints for the videos in the wild, which produces slightly better results. Anyway, if you followed the steps correctly, Detectron poses should be very similar.

We took the video from YouTube as well, in 1080p resolution.

wishvivek commented 5 years ago

@bucktoothsir Regarding getting visualizations of in-the-wild videos, in the second step, where you converted the input video to individual frames, how did you preprocess this incoming frame (scale, crop, center, etc.?) before getting the output from the Detectron?

bucktoothsir commented 5 years ago

@Godatplay

In terms of the output resolution, you set that with --viz-size. I chose 10 and it seems close, the default is 5.

I'm not sure how much difference it'll make, but consider also changing center as well since all 3 are used to renormalize the camera.

How did you build your dataset file?

your advices works. thanks. Now I get a high resolution output, but the performance remains bad.

I built a dataset file as the same structure as original dataset file. Specifically, I built a fake 3d dataset file and a 2d dataset file. The structure is 'S0/skating' and you could rename subjects and actions, then change the corresponding name in your test scripts.

bucktoothsir commented 5 years ago

@bucktoothsir Regarding getting visualizations of in-the-wild videos, in the second step, where you converted the input video to individual frames, how did you preprocess this incoming frame (scale, crop, center, etc.?) before getting the output from the Detectron?

I didn't take any preprocessing steps.

bucktoothsir commented 5 years ago

In terms of the output resolution, you set that with --viz-size. I chose 10 and it seems close, the default is 5.

I'm not sure how much difference it'll make, but consider also changing center as well since all 3 are used to renormalize the camera.

How did you build your dataset file?

I also write a dataset file by myself.

wishvivek commented 5 years ago

@bucktoothsir Thanks for the response. Also, I'm trying to get keypoints on my images using the Detectron Model (using the R-50-FPN End-to-End Keypoint-Only Mask R-CNN Baseline model, in this page), using the command:

python Detectron.pytorch/tools/infer_simple.py --dataset coco --cfg Detectron.pytorch/configs/baselines/e2e_keypoint_rcnn_R-50-FPN_1x.yaml --load_detectron Detectron.pytorch/data/pretrained_model/e2e_keypoint_rcnn_R-50-FPN_1x.pkl --image_dir videoframes --output_dir Detectron.pytorch/keypoints

but getting this error:

RuntimeError: The expanded size of the tensor (81) must match the existing size (2) at non-singleton dimension 0

So, it'll be great if you (or anyone else reading this) could provide any hints on how you're obtaining keypoints through this process. Thanks!

bucktoothsir commented 5 years ago

@wishvivek which version of python do you use?

wishvivek commented 5 years ago

@dariopavllo I have the 3D predictions from the model for my in-the-wild video, but they're all normalized (i.e., [-1,1]). So,

How do I unnormalize these 3D predictions? (My objective is to visualize the 3D reconstruction, just like the results at the top of this page.)
Usually, we use the mean and std of the dataset to normalize and unnormalize our data (Eg. as is done here. To my understanding, this is done w.r.t. the root joint. So, what is the normalization-unnormalization scheme used here?

Any help will be great, thanks!

lxy5513 commented 5 years ago

How to get keypts and bboxes ?

for 12_2017_baselines/e2e_keypoint_rcnn_R-101-FPN_s1x.yaml. Is there any already traind model to get 2D keyps and bboxes? like this /path/to/e2e_keypoint_rcnn_R-101-FPN_s1x.kpl

or I need to train on Detetron to get the model?Anyone can help me? thanks a lot

bucktoothsir commented 5 years ago

How to get keypts and bboxes ?

for 12_2017_baselines/e2e_keypoint_rcnn_R-101-FPN_s1x.yaml. Is there any already traind model to get 2D keyps and bboxes? like this /path/to/e2e_keypoint_rcnn_R-101-FPN_s1x.kpl

or I need to train on Detetron to get the model?Anyone can help me? thanks a lot

I used detectron as the author's advice.

tobiascz commented 5 years ago

Thanks @bucktoothsir to point me to this issue!

As I already mentioned in #2 I also was able to run the code on a in the wild example with my own fork of this repository. I also have some notes for Detectron in there for the people with difficulties. My 3D results are also way worse than the results created by @dariopavllo. I think my 2D poses are not accurate enough - also thanks to @lxy5513 who also suggested that.

So my next step would be to actually run the detectron poses through CPN to get better 2D results! If someone has another opinion please share maybe I did something wrong in my code?

My output

myOutput

Authors output

authorsOutput

YCyuchen commented 4 years ago

@Godatplay @tobiascz I use the inference code to run my own video, taking Detecton's 2d keypoints as input. the buttocks in my output seems fixed, while i think it should move. Have you met similar problem? Is there any potential solution i can try to improve the result？ My output crop-sport-lift2

tobiascz commented 4 years ago

Hey @YCyuchen,

The reason for that is that the 3D Skeleton is always visualized relative to the center hip joint (you called it buttock). To avoid this you could use the ankles as the relative center of the visualization. In your test video you can see that while the person is crouching the legs actually go up in the reconstruction.

https://github.com/facebookresearch/VideoPose3D/issues/51

In this issue @dariopavllo already discussed this

facebookresearch / VideoPose3D

bad performance on the same wild video #6

My output

Authors output