Closed torphix closed 2 years ago
Actually it is a best linear combination of K nearest samples by solving a least-square optimization, and what you say is just the condition of K=1.
Thank you, So the APC_feat_database is several thousand examples of the target speaker talking embedded into feature space using the APC network?
Yes, you're right.
Hi Thank you for amazing lib and open source code,
Helping me learn a lot. One question I had was with regards to the target speech representation database. Is it simply the embedding of several speech from target speaker and then the inputted speech is essentially mapped to the closest point within those embeddings?
Eg: Extract embedding from 50 obama utterances -> input arbitrary speech sample -> map embedding of arbitrary speech sample to the closest obama representation
Thank you