Azure-Samples / Cognitive-Speech-STT-ServiceLibrary

Service SDK - C# Samples, documentation for service to service speech to text
22 stars 26 forks source link

SpeechClient RecognizeAsync AudioStream throw a exception #9

Closed LGinC closed 6 years ago

LGinC commented 6 years ago

Audio format could not be parsed

I use the Kinect v2 audio stream.

wolfma61 commented 6 years ago

Can you give us some more detail, perhaps the code how you acquire the Kinect audio stream and your calls to the service?

LGinC commented 6 years ago

this is my class, I will input a stream from Kinect v2 audio stream. ` class BingSpeechClient : IDisposable { private SpeechClient client; private SpeechInput input; private readonly CancellationTokenSource cts = new CancellationTokenSource();

    private const string subscriptionKeyFileName = "Subscription.txt";

    private string subscriptionKey;
    private string LuisAppId = "";
    private string LuisSubscriptionID = "";
    private string AuthenticationUri = "";
    private string DefaultLocale = "en-US";

    public event Action<string> ResultRecved;
    public BingSpeechClient(Stream stream)
    {
        this.GetSubscriptionKey();
        DeviceMetadata deviceMetadata = new DeviceMetadata(DeviceType.Far, DeviceFamily.Xbox, NetworkType.Ethernet, OsName.Windows, "1703", "Gigabyte", "B250");
        ApplicationMetadata applicationMetadata = new ApplicationMetadata("SpeechTest", "1.0.0");
        RequestMetadata requestMetadata = new RequestMetadata(Guid.NewGuid(), deviceMetadata, applicationMetadata, "SampleAppService");
        input = new SpeechInput(stream, requestMetadata);
        Preferences preferences = new Preferences(DefaultLocale, new Uri(@"wss://speech.platform.bing.com/api/service/recognition/continuous"), new CognitiveServicesAuthorizationProvider(subscriptionKey));
        client = new SpeechClient(preferences);
        client.SubscribeToRecognitionResult(OnRecognizeResult);
    }

    private Task OnRecognizeResult(RecognitionResult arg)
    {
        if (arg.Phrases != null)
        {
            foreach (var item in arg.Phrases)
            {
                Console.WriteLine("{0} (Confidence:{1})", item.DisplayText, item.Confidence);
            }
        }
        return Task.FromResult(true);
    }

    public async Task StartRecognizeAsync()
    {
        try
        {
            await client.RecognizeAsync(input, cts.Token).ConfigureAwait(false);
        }
        catch (Exception e)
        {

            throw;
        }

    }

    private void GetSubscriptionKey()
    {
        //SubscriptionKey
        //LuisAppId
        //LuisSubscriptionID
        //AuthenticationUri
        string[] Keys;
        try
        {
            Keys = File.ReadAllLines(subscriptionKeyFileName);
            if (Keys.Length < 4)
            {
                Console.WriteLine("Please input the subscriptionKey to file");
                throw new Exception("Please input the subscriptionKey to file");
            }

            foreach (var item in Keys)
            {
                string[] t = item.Split(':');
                switch (t[0])
                {
                    case "SubscriptionKey":
                        this.subscriptionKey = t[1];
                        break;
                    case "LuisAppId":
                        this.LuisAppId = t[1];
                        break;
                    case "LuisSubscriptionID":
                        this.LuisSubscriptionID = t[1];
                        break;
                    case "AuthenticationUri":
                        this.AuthenticationUri = t[1];
                        break;
                    default:
                        break;
                }
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
            throw;
        }

    }

    public void Dispose()
    {
        client.Dispose();
    }
}`

this is Kinect v2 audio stream grab,

KinectSensor sensor = KinectSensor.GetDefault(); AudioSource audioSource = sensor.AudioSource; sensor.Open(); IReadOnlyList<AudioBeam> audioBeamList = audioSource.AudioBeams; System.IO.Stream stream = audioBeamList[0].OpenInputStream(); KinectAudioStream audioStream = new KinectAudioStream(stream); BingSpeechClient client = new BingSpeechClient(audioStream);

when I call the client.RecognizeAsync(input, cts.Token).ConfigureAwait(false); and throw a exception, message is Audio format could not be parsed.

Kinect v2 audio stream is 16KHz 32bit, and I use a kinectAudioStream class convert to 16bit

LGinC commented 6 years ago

I want to complete Automated Speech Recognition(ASR) , like the Speech Basics Sample in Kinect SDK v2. I just need to speech rather than speech after click a start record button. By the way, the Speech Basics Sample use Microsoft Speech Recognizer, but I want to Bing Speech or other Speech Recognizer.

wolfma61 commented 6 years ago

see issue https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows/issues/37

not sure, related?

priyaravi20 commented 6 years ago

@LGinC - Has this been resolved for you?

zhouwangzw commented 6 years ago

I am closing this issue due to lack of response. Please open a new one if you still have issues.