Open sanjeevmk opened 1 year ago
Hi,
temporal_scale
means the remaining frames after convolution. For example, I trained our model with 128-frame sequences and 4 layers of convolution, so the value of temporal_scale
would be 128/2**4=8
. In your case, this value should just be 1.
Ah I see, thanks!
Since the trained model provided requires a temporal_scale of 8 (for successful model loading), that means input sequences should be 128-frame.
Yes you are right. If you really need to train a model for 16-frame sequences, you can process your data according to https://github.com/c-he/NeMF/blob/main/src/datasets/amass.py and retrain the model.
Hi, I'm trying to do a sanity test of reconstructing 1 motion of 16 frames from the DFaust_67 dataset. I'll just detail the flow of code before the mismatch error happens.
The code goes to L132 of generative.py (inside encode_local). At this point, the size of "x" for me is, (1,360,16). 16 being the length of the clip, 360 being 24 joints * 15 params. Then self.local_encoder(x) is called and the tensor goes through LocalEncoder's forward method.
It goes through 4 layers, with the output size of each layer being: torch.Size([1, 420, 8]) torch.Size([1, 540, 4]) torch.Size([1, 840, 2]) torch.Size([1, 1680, 1])
After the view operation, the last layer outputs a 1x1680 tensor.
This when passed to self.mu() - L82 of prior.py - gives a size mismatch error as follows:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1680 and 13440x1024)
The Linear layer expects input of size 13440, which is 1680 x args.temporal_scale . However the output of the last layer I get is of 1680.
I don't know how to account for args.temporal_scale here..
Can you please let me know what I'm doing wrong / how I can fix this?
Thank you so much!
Best, S