chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 280 forks source link

Dtest and Dtrain #87

Open spagliarini opened 4 years ago

spagliarini commented 4 years ago

Hi, I am working on the evaluation of the generator model. I have a (possibly naive) doubt: in the paper is mentioned that the indicators Dself and Dtrain are computed to test the ability of the model to generate sounds that do not belong to the training dataset. They are basically an Euclidian distance within the datasets (either the training dataset or the generated datas) and across the datasets (training and generated). My question is: between which representation of the sound (sample vector, decoder output...) is the Euclidian distance computed?

Thanks, Silvia

chrisdonahue commented 4 years ago

Yes, these metrics are computed in a Euclidean space of log-Mel spectrograms. So we take the audio, extract spectrograms, and compute distances in that space. You can see the code here: https://github.com/chrisdonahue/wavegan/tree/master/eval/similarity . feats.py produces the features, and sim.py computes the metrics.

spagliarini commented 4 years ago

Thank you for your reply.

I tried to apply this measure using 1k of training data and 1k of generated data. For both Dtest and Dtrain I obtain high values. For example, I obtain a ~1000 as a mean distance. I was expecting a lower value since in the paper these values are of order 1.