facebookresearch / VideoPose3D

Efficient 3D human pose estimation in video using 2D keypoint trajectories
Other
3.69k stars 749 forks source link

Error on inference #152

Open daquang opened 4 years ago

daquang commented 4 years ago

I ran the first four steps fine without error, but step 5 is giving my the following error:

python run.py -d custom -k myvideos -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_detectron_coco.bin --render --viz-subject output.mp4 --viz-action custom --viz-camera 0 --viz-video input_directory/output.mp4 --viz-output output.mp4 --viz-size 6
Namespace(actions='*', architecture='3,3,3,3,3', batch_size=1024, bone_length_term=True, by_subject=False, causal=False, channels=1024, checkpoint='checkpoint', checkpoint_frequency=10, data_augmentation=True, dataset='custom', dense=False, disable_optimizations=False, downsample=1, dropout=0.25, epochs=60, evaluate='pretrained_h36m_detectron_coco.bin', export_training_curves=False, keypoints='myvideos', learning_rate=0.001, linear_projection=False, lr_decay=0.95, no_eval=False, no_proj=False, render=True, resume='', stride=1, subjects_test='S9,S11', subjects_train='S1,S5,S6,S7,S8', subjects_unlabeled='', subset=1, test_time_augmentation=True, viz_action='custom', viz_bitrate=3000, viz_camera=0, viz_downsample=1, viz_export=None, viz_limit=-1, viz_no_ground_truth=False, viz_output='output.mp4', viz_size=6, viz_skip=0, viz_subject='output.mp4', viz_video='input_directory/output.mp4', warmup=1)
Loading dataset...
Preparing data...
Loading 2D detections...
INFO: Receptive field: 243 frames
INFO: Trainable parameter count: 16952371
Loading checkpoint checkpoint/pretrained_h36m_detectron_coco.bin
This model was trained for 80 epochs
INFO: Testing on 1753 frames
Rendering...
INFO: this action is unlabeled. Ground truth will not be rendered.
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/dquang/anaconda3/envs/VideoPose3D --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1596712246804/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input_directory/output.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:00:29.22, start: 0.800000, bitrate: 2851 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 2847 kb/s, 60 fps, 60 tbr, 15360 tbn, 120 tbc (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc.
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> rawvideo (native))
Press [q] to stop, [?] for help
Output #0, image2pipe, to 'pipe:':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
    Stream #0:0(und): Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 2985984 kb/s, 60 fps, 60 tbn, 60 tbc (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc.
      encoder         : Lavc58.91.100 rawvideo
Traceback (most recent call last):000775kB time=00:00:21.95 bitrate=2985984.0kbits/s speed=2.98x
  File "run.py", line 778, in <module>
    render_animation(input_keypoints, keypoints_metadata, anim_output,
  File "/mnt/c/Users/Daniel Quang/Documents/GitHub/VideoPose3D/common/visualization.py", line 112, in render_animation
    for f in read_video(input_video_path, skip=input_video_skip, limit=limit):
  File "/mnt/c/Users/Daniel Quang/Documents/GitHub/VideoPose3D/common/visualization.py", line 53, in read_video
    yield np.frombuffer(data, dtype='uint8').reshape((h, w, 3))
ValueError: cannot reshape array of size 3178496 into shape (1080,1920,3)
dariopavllo commented 4 years ago

Hi,

I can't exactly pinpoint the error... The strange thing is that both the Detectron script and the visualization script use the same ffmpeg syntax for reading the video, but you said that the first step was successful.

It looks like the script is reading a partial frame from the stream at 21.95 seconds. Are you using the same video? You could perhaps try to re-encode the video. If it is corrupted for some reason, that should fix it.

daquang commented 4 years ago

Hi,

Thanks for your quick response. Perhaps there's something wrong with how my video is encoded. Do you have an example of an .mp4 file that you know will work well with your workflow?

Here's the video I've been using: https://www.dropbox.com/s/58azh2wnqu7q95o/dance.mp4?dl=0

-Daniel

daquang commented 4 years ago

Just to add to this, I'm doing this on Windows. I've been using a combination of WSL 2 and just the base Windows 10 system, so the behavior is very inconsistent. Anyways, I managed to get something working!

https://www.dropbox.com/s/aylzeiv0ctsga0y/output.mp4?dl=0

daquang commented 4 years ago

Any way to output the results to a format readable that can be read into a 3D rendering program (eg .bvh files)?

dariopavllo commented 4 years ago

The argument --viz-export allows you to export the predicted 3D joint positions to a NumPy archive. If you want to use them to animate a rigged skeleton, you'll probably need to run inverse kinematics on top of them.