chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 282 forks source link

Inception Scores with Animal Vocalizations #83

Open alexanderbarnhill opened 4 years ago

alexanderbarnhill commented 4 years ago

I'm working on a GAN for generating animal vocalizations and I'm a bit curious as to calculating the inception score. It seems that, if I understand the IS correctly, I just need a binary classifier to tell me if the sample contains the vocalization or just noise. This is of course different from the SC09 dataset, which would tell me if the sample is a 0, 1, ...etc. So I would just generate several thousand samples with my GAN which do ostensibly contain the correct vocalization, and then several thousand samples with noise, and this should at least give me the first part of the IS. Is all of that correct?

Also, when attempting to run the train_wavegan.py script with the incept flag I'm running into the following error:

failed to make cuFFT batched plan: 5 Initialize Params: rank: 1 elemn_count: 1042 input_embed 1024 input_stride: 1 input_distance: 1024 output_embed: 513 output_stride 1 output_distance: 513 batch_count: 12800 failed to initialize batched cufft plan with customized allocator: Failed to make cuFFT batched plan.

Any insight into this? Or any other ideas how I can best calculate IS?

spagliarini commented 4 years ago

Hi, for my understanding you will need first to have a way to classify your generations (the equivalent to the classifier they have, but trained on your real dataset and able to classify in your vocabulary). Then, compute the inception score.

alexanderbarnhill commented 4 years ago

Yeah I've come to the same conclusion, unfortunately not really feasible for me so I will have to try something else. Thanks for the input though!