auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

make_metadata.py logic puzzle #34

Closed leijue222 closed 3 years ago

leijue222 commented 3 years ago

https://github.com/auspicious3000/SpeechSplit/blob/10ed8b9e25cce6c9a077e27ca175ba696b7df597/make_metadata.py#L17-L25 Hi, I don't understand the code logic of this paragraph. Shouldn't everyone's id be different? Why are there only two? If I want to use VCTK 20 speakers to train, does this paragraph have to be modified? Could you explain a bit of it to me?

leijue222 commented 3 years ago
  1. How to generate file like demo.pkl to inference? I generate train.pkl by make_metadata.py is wrong to inference. The two files are stored in different formats.
  2. How to generate *-P.ckpt model?
CYT823 commented 3 years ago

@leijue222

Hi, I don't understand the code logic of this paragraph. Shouldn't everyone's id be different? Why are there only two? If I want to use VCTK 20 speakers to train, does this paragraph have to be modified? Could you explain a bit of it to me?

I guess that is only for demo. If you just only have 20 speakers, then you can just imitate things like:

if(speaker == 'p226'): 
    spkid[0] = 1.0
else if(speaker == 'p227'):
    spkid[1] = 1.0 
else if(speaker == 'p228'):
    spkid[2] = 1.0 
etc

Author use one-hot embedding as the speaker id if you wanna change to zero-shot learning, you may need to change the one-hot vector to speaker embedding vector. I still got some problem, however. lol I'm still working on how to make my own demo.pkl, which I consider is used to validate.

leijue222 commented 3 years ago

@CYT823 Yes, I wanted to do zero-shot unsee(little data and no guarantee of quality) to seen(big data set and high quality) before. I don't know how effective this project is. The demo provided by the author can't evaluate the cost.

CYT823 commented 3 years ago

ha. I don't know either, but the results on the demo web page sound great. I'm still trying to make it to zero-shot. Hope the result can be as great as the demo page result when I use speaker embedding instead of one-hot.(crossing fingers)

leijue222 commented 3 years ago

Please leave a message here if you get great results. I am currently scheduled to do other tasks. Good luck!

skol101 commented 2 years ago

@CYT823 did you manage to create pkl file for inference?

CYT823 commented 2 years ago

Hi @skol101, I am sorry, but I'm no longer working on this project.

Besides, I don't think you really need a pkl file during the inference mode. The pkl file, which is for validation, is using in training mode. During inference, you just need to put two voices as input and get the result.