DinoMan / speech-driven-animation

949 stars 289 forks source link

Figure 5 #40

Closed lzkzls closed 4 years ago

lzkzls commented 4 years ago

Hello, in Figure 5, how does conv3d change the video frame sequence of (3 5 96 * 64) into (64, 48, 32)?

DinoMan commented 4 years ago

If you have a 3D convolution with the kernel described in the paper and no padding in the temporal dimension you will get 1 x num_channels x 48 x 32 which is the same as (64, 48, 32).