Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
537 stars 58 forks source link

Is it necessary for commenting out the self.padding_mode != 'zeros'? #29

Closed youngstu closed 10 months ago

youngstu commented 1 year ago

Is it necessary for commenting out the self.padding_mode != 'zeros'? It won't report an error without making any modifications. Will it affect model accuracy. Thanks.

IMPORTANT: Please make sure to modify the site-packages/torch/nn/modules/conv.py file by commenting out the self.padding_mode != 'zeros' line to allow for replicated padding for ConvTranspose1d as shown https://github.com/NVIDIA/tacotron2/issues/182.

youngstu commented 1 year ago

It looks that ConvTranspose1d not used and ConvTranspose1d padding mode wouldn't affect the model accuracy.

https://github.com/Doubiiu/CodeTalker/blob/e687bbe64bb3553d5653041a9d80a62eae593ebf/config/BIWI/stage2.yaml#L25

    if args.quant_factor == 0:
        self.expander.append(nn.Sequential(
                    nn.Conv1d(size,dim,5,stride=1,padding=2,
                                padding_mode='replicate'),
                    nn.LeakyReLU(self.args.neg, True),
                    nn.InstanceNorm1d(dim, affine=args.INaffine)
                            ))
    else:
        self.expander.append(nn.Sequential(
                    nn.ConvTranspose1d(size,dim,5,stride=2,padding=2,
                                        output_padding=1,
                                        padding_mode='replicate'),
                    nn.LeakyReLU(self.args.neg, True),
                    nn.InstanceNorm1d(dim, affine=args.INaffine)
                            ))  
aurelianocyp commented 10 months ago

have you solved the problem? I also have this question because there are many self.padding_mode != 'zeros' , I don't konw comment out which one.

Doubiiu commented 10 months ago

Yes. You do not need to comment it out if you just use the proposed model without modifying the architecture (It is required to comment it out when modifying P in the main paper, indicating compression ratio of features in the temporal axis).