c-he / NeMF

[NeurIPS 2022] Official implementation of "NeMF: Neural Motion Fields for Kinematic Animation"
MIT License
156 stars 9 forks source link

Potential issue of nn.GroupNorm(8, out_channels) #6

Closed Grem-Lin closed 1 year ago

Grem-Lin commented 1 year ago

Hi I am trying to train the NeMF model (generative) on the dog dataset. After setting up the dataset and network architecture, I met with an issue:

RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [16, 810, 64] and num_groups=8

It seems to be related to the code below with num_groups = 8 for group normalization: https://github.com/c-he/NeMF/blob/146a1eade5dd7eb77db8380c7f03adf99bfb09a2/src/nemf/residual_blocks.py#L126 where my input data has the size of [16, 810, 64] and 810 cannot be divided by 8. I checked with AMASS dataset, its input size is [16, 360, 128] and it is safe...

I am wondering if there is any proper way of fixing this?

Thanks

Grem-Lin commented 1 year ago

Hi some follow-up: I found the dog skeleton has edge_num = 27, which isn't 24 as AMASS dataset and this makes 2 15 27 = 810 where 15 is channel base 27 is edge_num. So I feel there isn't a way to change 810 to be divided by 8. Do you have any suggestions for changing the group normalization? I am pretty new to train the deep network. Thanks!

c-he commented 1 year ago

Hi sorry for my late reply. You can just comment out that line if your data cannot be divided by the number of groups. It won't affect the performance too much.

Grem-Lin commented 1 year ago

No, no, I really appreciate your quick reply each time ^_^ So, you mean we directly remove L 126, and seq will not have group normalization?

c-he commented 1 year ago

Yes just remove GroupNorm.

Grem-Lin commented 1 year ago

Hi I have two questions about global translation. I guess there are 3 ways of dealing with global trans in generative.py decoder with the corresponding parameter setup, please correct me if I am wrong. As in the generative.yaml file, there are three arguments as below: https://github.com/c-he/NeMF/blob/146a1eade5dd7eb77db8380c7f03adf99bfb09a2/configs/generative.yaml#L18-L20 https://github.com/c-he/NeMF/blob/146a1eade5dd7eb77db8380c7f03adf99bfb09a2/configs/generative.yaml#L78

  1. Don't deal with it. In this case, we set: global_output: 6, output_trans: False, pretrained_gmp: empty and we can just train the model (encoders and decoder).
  2. Learn the global translation with compute_trajectory(), which needs more global info as in the code below. So we make global_output: 9+N (suppose contacts dim=N), output_trans: True, pretrained_gmp: empty, and train the model. https://github.com/c-he/NeMF/blob/146a1eade5dd7eb77db8380c7f03adf99bfb09a2/src/nemf/generative.py#L220-L232
  3. Predict the global translation through gmp. In this case we set: global_output: 6, output_trans: False, pretrained_gmp: gmp.yaml . This assumes that gmp is trained. And if we don't use the predicted translation, we can just set pretrained_gmp as empty.

My second question is, among 2 and 3, can we say that 3 is better than 2?

One more question, what is 'contacts'.

Thank you!

c-he commented 1 year ago

Yes your understanding is correct! We also verified that setup 3 (using a standalone gmp) produces better results in applications such as motion in-betweening, while setup 2 tended to have more obvious foot skating artifacts (Figure 1 in our supplemental material).

Grem-Lin commented 1 year ago

I see! How about applications like motion reconstruction? Is setup 3 better than setup 2? Another question is why we have latent space optimization in motion reconstruction. Is there anything wrong with only having an encoder-decoder architecture as in training? Thanks!

c-he commented 1 year ago

For motion reconstruction, setup 2 and 3 have similar performance in our experiments. We use latent optimization since it will produce sharper results compared to those obtained directly from VAE.

Grem-Lin commented 1 year ago

I see. Thank you very much!