deepsound-project / samplernn-pytorch

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
MIT License
288 stars 75 forks source link

Convolution dim shuffling #24

Closed williamFalcon closed 6 years ago

williamFalcon commented 6 years ago

In the FrameLevel forward you guys do:

    def forward(self, prev_samples, upper_tier_conditioning, hidden):
        (batch_size, _, _) = prev_samples.size()

        # (batch, seq_len, dim) -> (batch, dim, seq_len)
        input = prev_samples.permute(0, 2, 1)

        # (batch, dim, seq_len)
        # use conv1d instead of FC for speed
        input = self.input_expand(input)

        # (batch, dim, seq_len) -> (batch, seq_len, dim)
        input = input.permute(0, 2, 1)

        # add conditioning tier from previous frame 
        if upper_tier_conditioning is not None:
            input += upper_tier_conditioning

        # reset hidden state for TBPTT
        reset = hidden is None
        if hidden is None:
            (n_rnn, _) = self.h0.size()
            hidden = self.h0.unsqueeze(1) \
                            .expand(n_rnn, batch_size, self.dim) \
                            .contiguous()

        # -
        (output, hidden) = self.rnn(input, hidden)

        # permute again so this can upsample for next context
        output = output.permute(0, 2, 1)
        output = self.upsampling(output)
        output = output.permute(0, 2, 1)
        return (output, hidden)
  1. are the comments I added correct?

  2. I'd like to just use the Linear layer instead of the Conv1d first for understanding purposes. However, the dimensions don't line up when I do it that way. Any thoughts on how to reframe this in terms of a Linear layer?

  3. I assume the transposes you do are so that the convolutions work out? is that standard when using Conv1d instead of Linear layer?

williamFalcon commented 6 years ago

@koz4k

williamFalcon commented 6 years ago

basically something like (pseudocode):


# batch = 7, steps=12, dim=100
x = np.shape(7, 12, 100)

# Linear way
out_linear = x.view(-1, 100)
out_linear = linear(out_linear) # say fc maps from 100 -> 200 in dim
out_linear = x.view(7, 12, 200)

# Conv way
conv_out = x.permute(0, 2, 1)
conv_out = conv1d(conv_out)
conv_out = x.permute(0, 2, 1)
koz4k commented 6 years ago

Yes, should work this way. permute is there so that it works with Conv1d.

williamFalcon commented 6 years ago

@koz4k ok awesome... thanks!