facebookresearch / VisualVoice

Audio-Visual Speech Separation with Cross-Modal Consistency
Other
218 stars 35 forks source link

wrong for inference test demo video #14

Closed JusperLee closed 2 years ago

JusperLee commented 2 years ago

When I inference the test demo video, I got the error information: and I found it in "audioVisual_feature = torch.cat((visual_feat, audio_conv8feature), dim=1)"

RuntimeError: Given groups=1, weight of size [512, 1152, 3, 3], expected input[1, 1792, 2, 64] to have 1152 channels, but got 1792 channels instead