f90 / AdversarialAudioSeparation

Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction"
https://arxiv.org/abs/1711.00048
MIT License
83 stars 15 forks source link

Fully adversarial training #1

Open faroit opened 5 years ago

faroit commented 5 years ago

Before I try it myself, I wanted to ask if you tried training the network without finetuning and starting from scratch with a fully adversarial training. Is that too hard to train? Did you try some other conditional GAN flavors?

faroit commented 5 years ago

Ping @f90

f90 commented 5 years ago

Sorry for the late reply I didn't realise you have to watch your OWN repositories for changes to be messaged to you by mail!

Yes I tried training from scratch, but then you don't get enough stability, at least when you use the normal GAN or other variants that are not super heavily stabilised already. Also training tends to take longer to converge.

In the end I found the Wasserstein GAN to be the most stable, so I figure with a bit of tweaking it should be possible to train it in a fully adversarial way, without needing pretraining or other things (like the guys that did SVSGAN), but still... It can be tricky, be aware! But hey, maybe you get something good working, it's possible since we didn't actually focus on that setting in our paper, we wanted to explore how we can use the GAN for semi-supervised/unsupervised training with unpaired data, so the MSE is a natural solution for the paired data and also stabilises training a lot, so we didn't have to worry so much about GAN stability anymore.

Hope that answers your questions?

Good luck :) Daniel