Using encoder as speaker embedding extractor 关于使用编码器作为说话人嵌入提取器

auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck

http://arxiv.org/abs/2004.11284

MIT License

636 stars 92 forks source link

Using encoder as speaker embedding extractor 关于使用编码器作为说话人嵌入提取器 #79

Closed ehsienmu closed 10 months ago

ehsienmu commented 10 months ago

Is it possible to obtain a speaker embedding without timbre using just the encoder? For example, the utterances are first extracted using the encoder, and then using a classification model to make it a speaker representation.

请问是有可能只用encoder来获得不含音色的speaker embedding吗？例如，先使用编码器提取话语，然后使用分类模型使其成为说话者表示等方法

auspicious3000 commented 10 months ago

抱歉，读了十几遍，还是没看懂问题。不含音色怎么还能成为speaker embedding呢，分类模型怎么能让它成为speaker embedding呢……

ehsienmu commented 10 months ago

你好，感谢你的回复。我可能之前描述得不够清楚，我想探索是否可以通过pitch和rhythm来判断一段声音是属于哪位语者，进而看到您的这篇论文。所以我想问是否可以通过您设计的编码器先得到pitch和rhythm的embedding，我再使用speaker embedding或speaker verification进行任务。不知道是否可行。

auspicious3000 commented 10 months ago

不是不可以，但我这个pitch和rhythm的embedding并不能保证完全不包含timbre信息

ehsienmu commented 10 months ago

谢谢你