Closed AndrewLang closed 1 year ago
@jiajzhan could you help to check?
@AndrewLang Can you please provide an example of your Lexicon and your SSML that references it? Thanks,
@BrianMouncer here is my ssml content, the lexicon is in it.
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="zh-CN" >
<voice name="zh-CN-YunzeNeural">
<lexicon uri="https://matrixreader.blob.core.windows.net/public/lexicon.xml" />
任我行冷笑道, 剑指小腹,这个小姑娘。姊姊还好吗?
任我行大声道:你们这些人,都是我的手下败将,还不束手就擒!
重重的击了一拳。
藏灵上人也真了得,受了内伤。
老人家有点不舒服,是什么病?请的是哪位大夫?
小姑娘,你是哪里人?你叫什么名字?你家里人知道你在这儿吗?
</voice>
</speak>
I am investigating. Thanks
@jiajzhan any updates?
Update alphabet to 'sapi' as you are using pinyin in ZhCN. Besides, we have custom lexicon validation tool on https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/CustomLexiconValidation
@jiajzhan even I set it as sapi
, it doesn't work as expected, The pronunciation is same to the audio without the lexicon.
Try again, it should work now. When lexicon content changed, it will take at least 15 mins to get the latest content.
@jiajzhan I gave it overnight, and tried it again, it doesn't look like the lexicon was picked up.
@jiajzhan any suggestions can make it work?
"任我行" has different pronunciation now when using your latest lexicon, I tried on YunzeNeural. So which words are still not working as your expectation?
@jiajzhan, with my lexicon, "大夫" should be "dai 4 fu 1", "姊姊" should be "jie 3 jie 3", "藏灵" should be "zhang 4 ling 2", "了得" should be "liao 3 de 2"...
The problem is the lexicon is NOT adopted.
I did a test in Speech Studio, same content, same lexicon, the pronunciation is correct.
"大夫" is not working with custom lexicon, we are investigating, others works well on my local using Yunze voice
"姊姊" should be "jie 3 jie 3", "藏灵" should be "zhang 4 ling 2", "了得" should be "liao 3 de 2". These works OK on my local, so you are still using SSML you shared above?
please set pronunciation for '哪位大夫' with 'na 3 wei 4 dai 4 fu 1', then it will work
@jiajzhan interesting, did you use the lexicon file url https://matrixreader.blob.core.windows.net/public/lexicon.xml
?
Is there other settings I need to set, no matter how hard I try, it doesn't work on my side.
Can you share your code?
BTW, my service region is eastus
, could that be a problem?
Region should not be the problem, I will share my code soon.
public static async Task CustomLexiconRequest()
{
var speechConfig = SpeechConfig.FromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.SpeechSynthesisVoiceName = "Microsoft Server Speech Text to Speech Voice (zh-CN, YunyeNeural)";
string fileName = "SpeechSynthesisOutputCustomLexicon.wav";
var fileOutput = AudioConfig.FromWavFileOutput(fileName);
Console.OutputEncoding = Encoding.UTF8;
using (var speechSynthesizer = new SpeechSynthesizer(speechConfig, fileOutput))
{
string text = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='zh-CN'><voice xml:lang='zh-CN' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (zh-CN, YunyeNeural)'><lexicon uri='https://matrixreader.blob.core.windows.net/public/lexicon.xml'/>" +
"任我行冷笑道, 剑指小腹,这个小姑娘。 姊姊还好吗?</voice></speak>";
var speechSynthesisResult = await speechSynthesizer.SpeakSsmlAsync(text);
OutputSpeechSynthesisResult(speechSynthesisResult, text);
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
Try this code?
I see your code usesed a different voice name YunYe instead of YunzeNeural. So, it seems linke YunzeNeural doesn't support custom lexicon?
oh, it's a mistake, so have you tried Yunyeneural for your case, does it work?
Checked, doesn't work as expected. So far, I only see the custom lexicon work in Speech Studio. There is no useful information to debug it.
could you share your code/project into a zip to me?
My project is relative complex, there are many services, it doesn't help to diagnose it. My test code is pretty straight forward, here is the main code.
class Program
{
public static async Task Main()
{
Console.OutputEncoding = Encoding.UTF8;
var subscriptionKey = "";
var serviceRegiion = "eastus";
var voiceName = "zh-CN-YunzeNeural";// "zh-CN-henan-YundengNeural"; // "zh-CN-liaoning-XiaobeiNeural"; // *** "zh-CN-YunzeNeural"; // ** "zh-CN-YunyeNeural"; //** "zh-CN-YunyangNeural";//"zh-CN-YunxiNeural"; // ** "zh-CN-YunjianNeural"; // "zh-CN-YunhaoNeural";// "zh-CN-YunfengNeural";// "zh-HK-DannyNeural";
var language = "zh-CN";// "zh-CN"; //"zh-CN-henan"; // "zh-CN-liaoning"
var config = SpeechConfig.FromSubscription(subscriptionKey, serviceRegiion);
config.SpeechSynthesisVoiceName = voiceName;
config.SpeechRecognitionLanguage = language;
config.SpeechSynthesisLanguage = language;
config.EnableAudioLogging();
//config.EnableDictation();
//config.SetProfanity(ProfanityOption.Removed);
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = LoadTestSsml();
Console.WriteLine(ssml);
using var result = await synthesizer.SpeakSsmlAsync(ssml);
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
using var audioStream = AudioDataStream.FromResult(result);
await audioStream.SaveToWaveFileAsync("output_with_lexicon.wav");
Console.WriteLine("Audio file was written to file");
}
else
{
Console.WriteLine("");
var detailed = SpeechSynthesisCancellationDetails.FromResult(result);
Console.Write($"{result.Reason} by the service, {detailed.ErrorDetails}");
Console.WriteLine("");
}
}
private static string LoadTestSsml()
{
var file = "ssml.xml";
if (File.Exists(file))
{
return File.ReadAllText(file);
}
return string.Empty;
}
}
@jiajzhan I created another test example with Node Js SDK, with this the lexicon was applied, but there is another problem it only generates audio around 11s, and I got error code 1007.
Since JS SDK works, it should be a problem with C# SDK. Is the SDK open source, I can help to check if it is.
@AndrewLang I did not repo your issue using your code, on my local, the custom lexicon works well, my SDK version is 1.26.0
@jiajzhan thanks for the info. I did more testing and I found if there are words seems not supported or well recognized, then it cause the whole lexicon file not adopted, at least it looks like this. Also, the error message is pretty confusing and not help for diagnose. If you can provide more insights that will be appreciated.
Hi @AndrewLang I am on vacation previous days. For this: " if there are words seems not supported or well recognized, then it cause the whole lexicon file not adopted", it's correct. If one word is set a wrong pronunciation, the whole lexicon won't work.
So are you testing with lexicon: https://matrixreader.blob.core.windows.net/public/lexicon.xml ? I test this lexicon before and its working.
yes, that's the lexicon I used.
I tried to remove some of the words and test it one by one. From human perspective, those words are correct, how the service determine it's right or not? For example, 大夫 in some cases should be read as "dai 4 fu 1", but it's never picked up.
The lexicon is working well on my local, is this still a issue for you? If so, do we have a way to set up a temp meeting to discuss this?
@jiajzhan thanks for your help. I think it works so far. The lexicon is tricky and there is not much document for it, especially for Chinese.
From the latest comment I understand this issue can be closed now.
Describe the bug In Chinese, a word could have different pronunciation based on the context, so I created a custom lexicon to correct the pronunciation. The lexicon file is stored in Azure storage and can be accessed in public, the url is embedded in the ssml content correctly, generate auto with the SpeakSsmlAsyml method, there is no change to the audio
To Reproduce Steps to reproduce the behavior:
Create a SpeechSynthesizer instance with following configuration VoiceName = "zh-CN-YunzeNeural", Language = "zh-CN",
Call SpeakSsmlAsync
using var result = await synthesizer.SpeakSsmlAsync(ssml);
with following SSML content.Save the audio content to a file
Listen to the audio, the word "任我行" has different pronunciation.
Also tested my lexicon file in the Speech Studio, the pronunciation is correct. So it could be something wrong in SDK.
Expected behavior Expect the the lexicon is adopted by sdk and pronunciation is correct.
Version of the Cognitive Services Speech SDK Version 1.27.0
Platform, Operating System, and Programming Language
Additional context