How to train in ONE-HOT pattern?

Hi I have not trained one-hot version, but I have some idea to say~ The only difference between one-hot and speaker encoder version is: weather the speaker's embedding can be trained by AutoVC training process. How to train in one-hot pattern, may like this:

get the number of total speakers, maybe 40
set a lookup embedding table, like multi-speaker tacotron2
every time get the sentences to train, the input is: mels for content encoder, not use speaker embedding, just sent speaker id as input, and among lookup embedding table, then get a trainable embedding vector, and concat this vector with content vector
when gradient back, speaker's embedding vector will change alittle
for all the training process, the same speaker has same embedding vector; like word embedding.

In fact, 「How to train in one-hot pattern」in author's mind may be just the most simple way to train model when face to multi-speaker problem, it's better than speaker encoder version because it's embedding can change by gradient , but speaker encoder's embedding can not.

auspicious3000 / autovc

How to train in ONE-HOT pattern? #68