BarclayII / audiogan

GAN for (raw) audio generation
2 stars 1 forks source link

the dimension problem #1

Open yutian-wang opened 7 years ago

yutian-wang commented 7 years ago

I am doing the same task as you do and your code inspired me very much. your code looks pretty, but after I run it with amplitudes=16000, I got a wrong at line200 in mainft.py, which xgen.shape[1]=8000. I checked the code and found that may caused by the Conv2DTranspose layer and the config parameters. but I am not sure how the parameters are transposed to 8000, can you explain the internal mechanics? by the way, if I want to change --amplitudes to 16000 or other number like 48000, how can I modify the parameters? thanks

BarclayII commented 7 years ago

I'm currently refactoring the code to make it less dependent on Keras components and fix those inconsistencies including the one you have mentioned. But to hot-fix this, you need to change the dimension of noise as well. Because I used 'same' padding for Conv2DTranspose layers, the output size of a Conv2DTranspose layer is just X times the input size where X is the filter strides. Let's say you are using the default configuration of the disciminator (5 layers with strides 5, 2, 2, 2, 2), then to generate 16000 amplitudes you need to specify the noise dimensions as 200.