Multiplication with args.temporal_scale leads to layer size mismatch in reconstruction

sanjeevmk commented 1 year ago

Hi, I'm trying to do a sanity test of reconstructing 1 motion of 16 frames from the DFaust_67 dataset. I'll just detail the flow of code before the mismatch error happens.

The code goes to L132 of generative.py (inside encode_local). At this point, the size of "x" for me is, (1,360,16). 16 being the length of the clip, 360 being 24 joints * 15 params. Then self.local_encoder(x) is called and the tensor goes through LocalEncoder's forward method.

It goes through 4 layers, with the output size of each layer being: torch.Size([1, 420, 8]) torch.Size([1, 540, 4]) torch.Size([1, 840, 2]) torch.Size([1, 1680, 1])

After the view operation, the last layer outputs a 1x1680 tensor.

This when passed to self.mu() - L82 of prior.py - gives a size mismatch error as follows:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1680 and 13440x1024)

The Linear layer expects input of size 13440, which is 1680 x args.temporal_scale . However the output of the last layer I get is of 1680.

I don't know how to account for args.temporal_scale here..

Can you please let me know what I'm doing wrong / how I can fix this?

Thank you so much!

Best, S

c-he commented 1 year ago

Hi, temporal_scale means the remaining frames after convolution. For example, I trained our model with 128-frame sequences and 4 layers of convolution, so the value of temporal_scale would be 128/2**4=8. In your case, this value should just be 1.

sanjeevmk commented 1 year ago

Ah I see, thanks!

Since the trained model provided requires a temporal_scale of 8 (for successful model loading), that means input sequences should be 128-frame.

c-he commented 1 year ago

Yes you are right. If you really need to train a model for 16-frame sequences, you can process your data according to https://github.com/c-he/NeMF/blob/main/src/datasets/amass.py and retrain the model.

c-he / NeMF

Multiplication with args.temporal_scale leads to layer size mismatch in reconstruction #10