I had difficulty implementing a module called PBTC (Parallel Bank of Transposed Convlutions)

I was trying to reproduce a voice conversion model using PyTorch and I had difficulty implementing a module called PBTC (Parallel Bank of Transposed Convlutions), As in the paper, it consists of several parallel branches with transposed convlution followed by a linear projection.

The input to this module is a sequence of f0 (fundamental frequency) embeddings with shape [B, T, L]. Basically we want it to pass through the PBTC module and get a new sequence with shape [B, T, F].

As shown in the figure, it first passes through a transposed convolution and get a sequence with shape [B, t', F]. Then it passes through a linear projection and get a sequence with shape [B, T, F].

But the thing is how to define such linear projection? As in the figure, t' is computed using t, dilation dil and kernel size k, where t should be the length of input sequence, T. But we don't exactly know what t is since the length of input sequence varies among different data point. So should I fix the input length, truncating or padding the raw input sequence to fit this length? But this does't seem to make sense...

My implementation of PBTC module is as follow:

`class PBTC(nn.Module): """ Parallel Bank of Transposed Convolutions Reference: https://www.isca-speech.org/archive/pdfs/interspeech_2020/webber20_interspeech.pdf https://arxiv.org/pdf/2303.12197.pdf """

def __init__(self,
             in_channels,
             out_channels,
             input_length,
             output_length,
             kernel_size=50,
             num_branches=10):
    super(PBTC, self).__init__()
    self.in_channels = in_channels
    self.out_channels = out_channels
    self.input_length = input_length
    self.output_length = output_length
    self.kernel_size = kernel_size
    self.num_branches = num_branches

    self.branches = nn.ModuleList()

    for dilation in range(1, 2 * num_branches, 2):
        # t' = (t-1) + dil * (k-1) + 1
        input_length_prime = (input_length - 1) + dilation * (self.kernel_size - 1) + 1
        self.branches.append(
            nn.Sequential(
                # ConvTranspose1D(num_filters, stride, dilation)
                nn.ConvTranspose1d(in_channels, out_channels, kernel_size=50, stride=1, dilation=dilation),
                nn.Linear(in_features=input_length_prime, out_features=output_length),
                nn.ReLU()
            )
        )

def forward(self, seq):
    # seq: [B, N, L]
    seq = seq.permute(0, 2, 1)  # [B, L, N]
    encoded_seq = torch.zeros((seq.size(0), self.out_channels, seq.size(-1)))  # B, F, N]
    for branch in self.branches:
        encoded_seq += branch(seq)
    encoded_seq = encoded_seq.permute(0, 2, 1)  # [B, N, F]

    return encoded_seq`

how did you mplementing the PBTC module？

NZqian / SVCC2023-t23-ASLP

I had difficulty implementing a module called PBTC (Parallel Bank of Transposed Convlutions) #1