dhgrs / chainer-VQ-VAE

A Chainer implementation of VQ-VAE.
82 stars 19 forks source link

BatchNorm and ResidualBlocks in Encoder #5

Closed pfriesch closed 5 years ago

pfriesch commented 6 years ago

Have you experimented with adding BatchNorm, Dropout or Residual Blocks to the Encoder?

Even though, they did not mention it in the audio part of the paper, they added Residual Blocks to the encoder for the other experiments. Have you experimented with adding them?

Also they show some straight blocks between the ConvBlocks on the demo page .

dhgrs commented 6 years ago

I have not.

In the audio part of the paper; "the encoder has 6 strided convolutions with stride 2 and window-size 4" So the encoder may not have any residual blocks in author's implementation. But it's worth trying batchnorm, dropout and residual blocks I think! Can you try it?

FYI: In Chapter Reconstructions in demo page, there is no speaker condition. But it needs. I think this demo page doesn't express exactly implementation.

pfriesch commented 6 years ago

The voice conversion is the last entry. They call it "Voice Style-Transfer".

dhgrs commented 6 years ago

Sorry for the confusion. I wanted to tell you that "the demo page doesn't express implementation correctly". "Reconstructions" is one of the examples.

I think residual blocks are not used in the author's results but it's efficient. And I don't know whether BN and dropout are used or not. But may be efficient too.

Now I don't have machines to try it, so if you can, please try? You can change the encoder's architecture in models.py.

https://github.com/dhgrs/chainer-VQ-VAE/blob/58628304d60f778be0620360c663b7be00bf1181/models.py#L9-L27