evonneng / learning2listen

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)
107 stars 9 forks source link

Runtime error during training the predictor model #5

Closed leyi-123 closed 1 year ago

leyi-123 commented 2 years ago

Hello, when I trained the predictor model based on the provided VQ-VAE model, I got the runtime error:

python -u train_vq_decoder.py --config configs/vq/delta_v6.json

using config configs/vq/delta_v6.json starting lr 2.0 Let's use 4 GPUs! changing lr to 4.5e-06 loading checkpoint from... vqgan/models/l2_32_smoothSS_er2er_best.pth starting lr 0.01 Let's use 4 GPUs! loading from checkpoint... models/delta_v6_er2er_best.pth loaded... conan ===> in/out (9922, 64, 56) (9922, 64, 56) (9922, 256, 128) ====> train/test (6945, 64, 56) (2977, 64, 56) =====> standardization done epoch 7903 num_epochs 500000 Traceback (most recent call last): File "train_vq_decoder.py", line 217, in main(args) File "train_vq_decoder.py", line 202, in main patch_size, seq_len) File "train_vq_decoder.py", line 89, in generator_train_step g_optimizer.step_and_update_lr() File "/vc_data/learning2listen-main/src/utils/optim.py", line 25, in step_and_update_lr self._optimizer.step() File "/home/.conda/envs/L2L/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, *kwargs) File "/home/.conda/envs/L2L/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(args, **kwargs) File "/home/.conda/envs/L2L/lib/python3.6/site-packages/torch/optim/adam.py", line 144, in step eps=group['eps']) File "/home/.conda/envs/L2L/lib/python3.6/site-packages/torch/optim/_functional.py", line 86, in adam expavg.mul(beta1).add_(grad, alpha=1 - beta1) RuntimeError: output with shape [200] doesn't match the broadcast shape [4, 200]

I want to know how to solve it. Thanks in advance!

evonneng commented 2 years ago

Hmm it seems as if what is being outputted by your network is lacking a temporal dimension. The output should be 4x200 as there should be 4 patches in the sequence each of 200-d to match the number of elements in the codebook. Could you please check to see whether there is an indexing problem into the output temporal dimension?