DeepMotionEditing / deep-motion-editing

An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]
BSD 2-Clause "Simplified" License
1.57k stars 257 forks source link

train_problem #153

Open Hellodan-77 opened 3 years ago

Hellodan-77 commented 3 years ago

I modified data_filename = "bfa.npz" and num_classes = 16 in style_transfer/config.py to train the bfa dataset, but the following error occurred:

mean and std saved at /home/ddd/deep-motion-editing/style_transfer/probe/../data/bfa_norms/train_style3d.npz
data shape 1/3: (10, 42, 602720)
data shape 2/3: (10, 602720, 42)
data shape 3/3: (6027200, 42)
mean and std saved at /home/ddd/deep-motion-editing/style_transfer/probe/../data/bfa_norms/train_style2d.npz
here!
Traceback (most recent call last):
  File "style_transfer/train.py", line 167, in <module>
    main(args)
  File "style_transfer/train.py", line 53, in main
    iterations = trainer.resume()
  File "/home/ddd/deep-motion-editing/style_transfer/probe/../trainer.py", line 118, in resume
    self.model.dis.load_state_dict(state_dict['dis'])
  File "/home/ddd/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PatchDis:
    size mismatch for cnn_c.2.weight: copying a param with shape torch.Size([8, 144, 6]) from checkpoint, the shape in current model is torch.Size([16, 144, 6]).
    size mismatch for cnn_c.2.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([16]).

Do you know the reason?

HalfSummer11 commented 3 years ago

This is probably because there is already a pretrained model at the experiment folder specified by name(default value is set to pretrain) in config.py. Since that model is trained on xia dataset and the network architecture is dataset-dependent (e.g. #output_channel = #styles), it gives an error. Please set name to some other value to train a new model from scratch.

Hellodan-77 commented 3 years ago

Thank you very much for your answers! Then I would like to ask how do you judge the number of iterations when the training results are better?

HalfSummer11 commented 3 years ago

I usually visualize the tensorboard files in {name}/log and monitor the error curve (mainly using reconstruction loss as a reference) & style latent space visualization. I also visualize the output animations in {name}/output using probe/anim_view.py to check the quality.

Hellodan-77 commented 3 years ago

Thank you very much! There is another problem. When I run the gen_dataset.sh file, the xia_test file in style_transfer/data has changed from the original 8 bvh files to 56. I think this should be a test file, but the bfa_test file is not generated. If I want to test the bfa dataset after training the model, how should I choose the test data? Do I need to leave some test sets separately before training?

HalfSummer11 commented 3 years ago

We don't generate bfa_test. xia_test is not "generated", either. Instead, we manually placed some demo BVH inputs in it, so that people can reproduce our demo results. For BFA dataset, since we didn't use the data for our main demo. There's no bfa_test.

Hellodan-77 commented 3 years ago

If I want to test the bfa dataset after training the model, how should I choose the test data? Before training, do I need to leave some test sets separately? Or can it only be tested with the xia data set?

Hellodan-77 commented 3 years ago

I set max_iter = 200 when training the xia dataset, but the output is like this. Do you know why?

Resume from iteration 100045
Iteration: 00100046/00000200
[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.
trainfull_00100046_3d saved
trainfull_00100046_2d saved
test_00100046_3d saved
test_00100046_2d saved
Finish Training
HalfSummer11 commented 3 years ago

Regarding the first question, it's totally up to your preference. It makes more sense to test on the same bfa dataset because different datasets contain different biases. You can read the code to see how we divide training/test data for the bfa dataset, and devise your own way to extract BVH files for testing based on that / independent of that. Regarding the second question, again:

This is probably because there is already a pretrained model at the experiment folder specified by name(default value is set to pretrained) in config.py. Please set name to some other value to train a new model from scratch.

Hellodan-77 commented 3 years ago

Thanks!

Hellodan-77 commented 3 years ago

The video duration you used in the experiment is all 3s, so do I need to use 3s for other videos? Can the duration be different from yours? For example, the duration is 4s?

Hellodan-77 commented 3 years ago

I used my own video for the test, but the results of the migration experiment are very poor visually, the limbs have a lot of shaking, and the synthesized action is very different from the original video action of my own. Do you know what it is the reason?

Hellodan-77 commented 3 years ago

Hello, I use the video human joint points extracted by openpose. For example, each json file is as listed below: {"version":1.3,"people":[{"person_id":[-1],"pose_keypoints_2d":[631.27,339.413,0.895507,631.328,353.255,0.924807,617.529,353.115,0.972349,607.824,376.634,0.82523,605.831,398.127,0.8639,646.934,357.052,0.879857,650.862,382.501,0.92183,652.96,400.1,0.882746,621.622,398.191,0.90638,617.551,398.113,0.936588,603.95,427.585,0.920116,603.85,460.825,0.951916,633.311,398.256,0.952829,633.298,431.438,0.948102,635.351,464.864,0.875504,627.403,337.422,0.901981,633.275,337.468,0.953756,621.559,337.458,0.590103,635.319,337.574,0.86235,633.332,476.461,0.754847,639.153,476.5,0.744672,637.196,468.764,0.744926,601.994,464.813,0.810317,598.013,462.875,0.789491,604.011,460.936,0.850999],"face_keypoints_2d":[],"hand_left_keypoints_2d":[653.652,404.396,0.017995,651.48,403.64,0.0212857,656.959,404.963,0.0229864,648.268,404.302,0.0553904,647.04,404.396,0.0227719,656.675,405.057,0.0258647,647.418,404.113,0.0167763,648.268,403.924,0.0414395,646.001,404.68,0.0283852,655.258,404.018,0.0108795,655.353,407.419,0.0128161,648.646,404.018,0.0110045,657.809,406.758,0.0110612,655.353,404.018,0.0107435,655.636,408.364,0.0106011,657.242,407.136,0.00979569,658.281,406.947,0.0075582,656.203,413.087,0.0164367,649.496,403.074,0.00874274,647.323,404.018,0.0120165,646.379,404.585,0.00805321],"hand_right_keypoints_2d":[606.2,398.126,0.0131239,605.734,398.219,0.0231702,605.547,398.219,0.0257579,608.534,397.846,0.0395634,610.774,395.046,0.0278159,605.734,410.819,0.0230481,607.694,397.473,0.0173272,609.374,395.513,0.030163,610.774,395.046,0.0296088,605.267,410.819,0.015653,605.08,410.633,0.0191289,610.494,395.699,0.00950524,610.867,395.139,0.00850412,605.36,411.379,0.0162737,603.587,408.859,0.0135241,603.12,411.846,0.0105632,602.56,415.206,0.00813404,601.347,411.473,0.014097,602.187,412.126,0.010717,602.467,414.646,0.0125703,602.467,414.926,0.0104641],"pose_keypoints_3d":[],"face_keypoints_3d":[],"hand_left_keypoints_3d":[],"hand_right_keypoints_3d":[]}]} But there are 98 json files extracted when my video is 3s long. I think your 3s extracted only 89 json files. I wonder if the result of my migration is very poor. Is it because of the files extracted with your openpose Does the difference in the number lead to poor experimental results? Hope to receive your reply!

HalfSummer11 commented 3 years ago

I think the slight difference in video lengths isn't the main factor here - our network will collapse the time axis to obtain a time-invariant style code. Being trained on xia dataset, our pretrained model has limited generalizability as mentioned in the paper. Motion content (e.g. very unlikely to work for dancing motions), motion style (e.g. "crawling" as a style for walking), bone lengths or body proportions (since the 2D style encoder is trained on projections of the CMU skeleton), camera poses (if the character is walking towards the camera so that the size changes)... could all affect the quality of the result. Training on more data could be a way to improve the generalizability.