chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 281 forks source link

Generating spectograms using GANs #18

Closed ghost closed 5 years ago

ghost commented 5 years ago

I would like to ask you if it is common problem when generating spectrograms using waveGAN and GANs in general the filtering of high frequencies in the results.

andimarafioti commented 5 years ago

It shouldn't be a problem if you're generating magnitude spectrograms (as on the paper accompanying this repository). If you're working with phase, this problem has previously been reported for convolutional networks (see [1]). On [2] they generate phases using GANs and don't discuss this, but Figure 3 might give you some insight on it.

[1] https://www.researchgate.net/publication/328600925_A_context_encoder_for_audio_inpainting [2] https://openreview.net/pdf?id=H1xQVn09FX

ghost commented 5 years ago

I am only interesting in reconstructing the magnitude. I have noticed that while the generated data are really good and visually seems close to the real ones, however, it seems that i am loosing quality mainly from the high-frequencies (since the generated spectrograms having less red colour). Therefore, I am trying to figure out if it is a common issue when generating spectrograms.

On Wed, 9 Jan 2019 at 18:12, Andrés Marafioti notifications@github.com wrote:

It shouldn't be a problem if you're generating magnitude spectrograms (as on the paper accompanying this repository). If you're working with phase, this problem has previously been reported for convolutional networks (see [1]). On [2] they generate phases using GANs and don't discuss this, but Figure 3 might give you some insight on it.

[1] https://www.researchgate.net/publication/328600925_A_context_encoder_for_audio_inpainting [2] https://openreview.net/pdf?id=H1xQVn09FX

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chrisdonahue/wavegan/issues/18#issuecomment-452766923, or mute the thread https://github.com/notifications/unsubscribe-auth/AFlDT3cUT4B6bOU5qdyB0Jvcs0XOztoOks5vBiLqgaJpZM4ZoTrD .

chrisdonahue commented 5 years ago

Not entirely sure. I speculate that the problem is two-fold and has to do with both the GAN generation and the Griffin-Lim reconstruction. Not clear what proportion of the high frequencies issues can be attributed to each component.