chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 280 forks source link

How to get latent space vector value for specific sound generated? #79

Closed moih closed 4 years ago

moih commented 4 years ago

Hi,

I've been using waveGAN extensively and was wondering if there is a way to also generate the latent space vector from a specific sample I've generated in order to be able to use it in the future?

For instance, I'm already able to generate 10 samples from my pretrained checkpoints and would like to create an interpolation from Sound A to Sound B. If I want to create an interpolation from Sound A to a new generated sound, Sound Z, which I generated, lets say, days later, how am I able save the Z vector from these sounds for later use? For example, if i'd like to generate them again or do an interpolation to another sound?

Thanks!

spagliarini commented 4 years ago

Hi,

how do you generate Sound A and Sound B? Do you follow the example given in the Readme? If yes, I think that what you want to save are the latent vectors _z that you have defined there and gave you respectively Sound A and Sound B, let's call them _z_A and _z_B.

Is this what you were looking for?

moih commented 4 years ago

Yes, all that you mention is in the examples. I would like to go one step further and be able to recall these _zA, _zB, _zC..._zN that I used to generate at that specific moment. My question is more on the side of exploring the latent space and then have a way to access all the past examples the generator synthesized. Then be able to recall latent space points and do interpolations between newly generated samples; for example, to create an endless loop between generated outputs (zA > z B > zA)

Sent from my iPhone

On 8. Apr 2020, at 15:28, spagliarini notifications@github.com wrote:

 Hi,

how do you generate Sound A and Sound B? Do you follow the example given in the Readme? If yes, I think that what you want to save are the latent vectors _z that you have defined there and gave you respectively Sound A and Sound B, let's call them _z_A and _z_B.

Is this what you were looking for?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

spagliarini commented 4 years ago

I hope I have understood well the question. If you have already saved the vectors (in .pkl format if you used the example code), you just need to open them as you would open any .pkl file, and reshape them to obtain the right dimension (depending on the latent dimension you used to train).

I am interested in exploring the latent space too. I am up for discussion. For example, about this endless loop, are you interested in understanding which order is the best to follow? Or do you want to see just how to evolve one sound to another without any rule?

I mean, if you have sound A, sound B and sound C, how do you decide whether to do

sound A > sound B > sound C

or

sound A>sound C > sound B ?

moih commented 4 years ago

Hi @spagliarini ,

Thanks for continuing the discussion.

I have a basic understanding of GANs and still learning a lot. From what I understand, each individual audio sample generated by the waveGAN has it's own unique latent space vector; please correct me if I'm wrong.

I will describe a real world example where It would be great to input again a previously generated sample (and latent space vector) to a new session of generating audio:

Let's say I am generating some sounds for a music composition that I want to create using only waveGAN. The way I am doing it now is: 1- I am generating a bunch of random sounds from a checkpoint (as in the generator demo code) 2- I choose between 2 of those generated sounds to create an interpolation. 3- I generate that interpolation and then save the audio. 4- I close my Google Colab session, loosing all the randomly generated single shot examples (not interpolations)

Here is where my inquiry arises: Let's imagine I would like to come back to generate new random examples but I would also like to use the samples I already generated in my previous session, to interpolate between the old sounds and the new generated ones. In the current generator code, there is no way to upload a latent space vector or generate one based on the similitude of an uploaded .wav file to a latent space vector in waveGAN.

This is the solution I would like to try to come up with, to be able to save and recall a bank of latent space vectors in relation to different sounds generated from them for each checkpoint model used.

I am up for finding a solution collaboratively, since it seems this would make waveGAN more useful in real-world situations, at least for electronic music composers, sound designers, game audio professionals, which is my field.

Hope this explains what I'm talking about :)

spagliarini commented 4 years ago

4- I close my Google Colab session, loosing all the randomly generated single shot examples (not interpolations)

You just need to save the random vectors, in order to don't loose them. To save them check the preview function in train_wavegan.py and you can find an example of how to save the latent vectors. In the same function, you can also see an example of how to load them if you have them saved.

In the current generator code, there is no way to upload a latent space vector or generate one based on the similitude of an uploaded .wav file to a latent space vector in waveGAN.

Yep, you need to add the code to save the latent vectors. I don't use colab, I run somewhere else my code, but using the generation code example and the main train code you should be able to write a third code and save the vectors. But it is not possible to go back and use the .wav to find the corresponding latent vector. The GAN does not learn this link.

moih commented 4 years ago

I see, I will try your approach @spagliarini , thanks so much!