Open shanhaidexiamo opened 8 months ago
+1同问! 我也在跑训练, 可否交流下forward加入language embedding的时候, enroll_x_lens怎么计算的呢?@shanhaidexiamo, 我直接用了x.shape[-1], 不甚了解是不是正确😓
Hi,
Have you solved this problem? I'm also confused about the way to add the language embedding.
Kind of solved. Added language embedding in training just like in inference. If you are not getting the correct training stats, it is probably because of TextTokenizer: Plachtaa uses PhonemeBpeTokenizer, lifeiteng uses another one. In the dataset prepare stage, remember to add the language ID at the two ends of the text prompt.
Kind of solved. Added language embedding in training just like in inference. If you are not getting the correct training stats, it is probably because of TextTokenizer: Plachtaa uses PhonemeBpeTokenizer, lifeiteng uses another one. In the dataset prepare stage, remember to add the language ID at the two ends of the text prompt.
Thank you so much for sharing this!
Why are language embeddings being added to the phoneme embeddings instead of to the acoustic embeddings? As the paper says "Concretely, we embed language IDs into dense vectors and add them to the embeddings of acoustic tokens." in the last line of section 3.3. @Plachtaa
Adding language embedding to acoustic tokens doesn't make sense at all. I tend to believe this is a typo error
Why are language embeddings being added to the phoneme embeddings instead of to the acoustic embeddings? As the paper says "Concretely, we embed language IDs into dense vectors and add them to the embeddings of acoustic tokens." in the last line of section 3.3. @Plachtaa
Adding language embedding to acoustic tokens doesn't make sense at all. I tend to believe this is a typo error
You mean the authors wrote it wrong? Can you explain to me why it doesn't make sense at all? I'm new to "codes/codecs" and embeddings, just learning from reading papers and watching videos. That's why I wanted to see how you implemented it, but when I didn't find what the paper says I got confused.
hi,有个问题想咨询下大佬。在forward阶段,论文中描述的是,在acoustic token上加language embedding,但是推理阶段,language embedding只能加载text token上,这个问题您是怎么解的呢?我看您forward没放出来~。我的解法是forward阶段也加在了文本上,但是推理的效果很差。
希望能解惑,非常感谢。