ashishbajaj99 / mic

A simple stream wrapper for arecord (Linux (including Raspbian)) and sox (Mac, Windows). Returns a Passthrough stream object so that stream control like pause(), resume(), pipe(), etc. are all available.
MIT License
103 stars 61 forks source link

Mic and Azure Speech to Text #35

Open lucasctd opened 5 years ago

lucasctd commented 5 years ago

I am trying to recognize the user voice continuously, but I am always getting wrong results. Have anybody done something like this?

I will add some parts of my code so you can understand.

Here is how I create an instance of pushStream (MS Speech SDK)

this.pushStream = AudioInputStream.createPushStream(AudioStreamFormat.getWaveFormatPCM(16, 16000, 1));

Here is the method I call to recognize the user voice

    recognizeAsync() {
        this.audioConfig = AudioConfig.fromStreamInput(this.pushStream);
        this.recognizer = new SpeechRecognizer(this.speechConfig, this.audioConfig);
        this.subject = new Observable(subs => {
            this.subscription = subs;
            this.recognizer.startContinuousRecognitionAsync();
            this.recognizer.recognizing = (rec, {result}) => {
                subs.next(result);
            };
            this.recognizer.recognized = (rec, {result}) => {
                subs.next(result);
            };
        });
        return this.subject;
    }

And here is where I use the mic package to get the user voice data

speech = new Speech(language, subscriptionKey, region);
speech.recognizeAsync().subscribe(result => {
        console.log('result', result);
});
var micInstance = mic({
        rate: '16000',
        channels: '1',
        debug: false,
        exitOnSilence: 6,
        fileType: 'wav' //have also tried with raw type
});
const micInputStream = micInstance.getAudioStream();

micInputStream.on('data', function(data) {
    speech.pushStream.write(data);
    //console.log("Recieved Input Stream: ", data);
});
rhurey commented 5 years ago

The root cause here looks to be something with the stdio redirection resulting in twice the expected data being available.

I tried to manually call sox to see how it was producing audio. Experiment results: sox.exe -c 1 -b 16 -e signed-integer -r 16000 -t waveaudio default -p > redirect.wav Ran for 10s. redirect.wav is 655,408 redirect.wav

Had Sox write the file directly: sox.exe -c 1 -b 16 -e signed-integer -r 16000 -t waveaudio default redirect2.wav Ran for 10s. This output 327,724 redirect2.wav

That tells me the doubling of the data is happening as a result of the stdio redirect. It's not clear why that's happening, but the possibility that the doubling is platform specific causes fragility concerns. Plus who knows what extra data is winding up in the audio.

UCABJDP commented 4 years ago

40 May be the root cause here, piping audio out of sox forces the format to be 32 bit audio, which may gives appearance of it generating double the data when set to 16 bit.