alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
918 stars 248 forks source link

ASR for multiple languages in angular demo #108

Open kavehcs opened 3 years ago

kavehcs commented 3 years ago

hello

Thank you for sharing your knowledge

I want to use multiple languages in the Angular program demo.

For this purpose, I changed the following code in app.component.ts :

import { Component } from '@angular/core';
import { ElementRef, ViewChild} from '@angular/core'
import { DictateService } from "./dictate-service";

@Component({
  selector: 'app-root',
  templateUrl: './app.component.html',
  styleUrls: ['./app.component.css'],
  providers: [DictateService]
})

export class AppComponent {

  @ViewChild('results') private results: ElementRef;

  buttonText = 'شروع ضبط';
  buttonText_russian = 'شروع ضبط';
  textDataBase = '';
  textData = '';

  persian_server = "ws://localhost:2700";

  russian_server = "ws://localhost:2701";

  constructor(private dictateService: DictateService) {

  }

  switchSpeechRecognition() {
    if ( !this.dictateService.isInitialized() || this.persian_server === null ) {
      this.persian_server = "ws://localhost:2700";
      this.russian_server = null ;

      this.dictateService.cancel();

      this.dictateService.init({
        server: this.persian_server,
        onResults: (hyp) => {
          console.log(hyp);

          this.textDataBase = this.textDataBase + hyp + '\n';
          this.textData = this.textDataBase;
          this.results.nativeElement.scrollTop = this.results.nativeElement.scrollHeight;
        },
        onPartialResults: (hyp) => {
          console.log(hyp);

          this.textData = this.textDataBase + hyp;
        },
        onError: (code, data) => {
          console.log(code, data);
        },
        onEvent: (code, data) => {
          console.log(code, data);
        }
      });

      this.buttonText_russian = 'شروع ضبط';
      document.getElementById("listening_gif").style.display = "none";

      this.buttonText = 'پایان ضبط';
      document.getElementById("listening_gif").style.display = "inline";
    } else if (this.dictateService.isRunning()) {
      this.dictateService.resume();
      this.buttonText = 'پایان ضبط';
      document.getElementById("listening_gif").style.display = "inline";
    } else {
      this.dictateService.pause();
      this.buttonText = 'شروع ضبط';
      document.getElementById("listening_gif").style.display = "none";
    }
  }

  cleararea() {
    var cl = document.getElementById("textarea") ;
    console.log(cl);
    cl.value = "";
    this.textDataBase = '';
  }

  switchSpeechRecognition_russian() {
    if ( !this.dictateService.isInitialized() || this.russian_server === null ) {
      this.persian_server = null;
      this.russian_server = "ws://localhost:2701" ;

      this.dictateService.cancel();

      this.buttonText = 'شروع ضبط';
      document.getElementById("listening_gif").style.display = "none";

      this.dictateService.init({
        server: this.russian_server,
        onResults: (hyp) => {
          console.log(hyp);

          this.textDataBase = this.textDataBase + hyp + '\n';
          this.textData = this.textDataBase;
          this.results.nativeElement.scrollTop = this.results.nativeElement.scrollHeight;
        },
        onPartialResults: (hyp) => {
          console.log(hyp);

          this.textData = this.textDataBase + hyp;
        },
        onError: (code, data) => {
          console.log(code, data);
        },
        onEvent: (code, data) => {
          console.log(code, data);
        }
      });

      this.buttonText_russian = 'پایان ضبط';
      document.getElementById("listening_gif").style.display = "inline";
    } else if (this.dictateService.isRunning()) {
      this.dictateService.resume();
      this.buttonText_russian = 'پایان ضبط';
      document.getElementById("listening_gif").style.display = "inline";
    } else {
      this.dictateService.pause();
      this.buttonText_russian = 'شروع ضبط';
      document.getElementById("listening_gif").style.display = "none";
    }
  }

}

The correct results are displayed for the first time, but when I select another language, the results are no longer correct.

I think the error is in following code , but I do not know how to correct it

node.onaudioprocess = (e) => {
      if (this.paused) return;

      // console.log("00000000000000000000000000000000000000000000000");
      // console.log(e.inputBuffer.getChannelData(0));
      // console.log("00000000000000000000000000000000000000000000000");

      this.worker.postMessage({
        command: 'record',
        buffer: [
          e.inputBuffer.getChannelData(0)
        ]
      });
    };

please help.

nshmyrev commented 3 years ago

Please format the post properly first.

kavehcs commented 3 years ago

Please format the post properly first.

You are right I changed it

nshmyrev commented 3 years ago

The correct results are displayed for the first time, but when I select another language, the results are no longer correct.

What are "no longer correct"? What is the expected result? What is the result you see exactly?

Do you see any errors in javascript console? Do you see any errors in the server logs?

kavehcs commented 3 years ago

The correct results are displayed for the first time, but when I select another language, the results are no longer correct.

What are "no longer correct"? What is the expected result? What is the result you see exactly?

Do you see any errors in javascript console? Do you see any errors in the server logs?

When I first load the page and select a language, it is recognized correctly, but when I switch between the buttons, the speech is no longer recognized correctly.

Between the switch buttons, the following event is displayed in cmd :

When I choose a language for the first time : 5 Send: blob: audio/raw, 22578 (+- e)

When I choose a language for the second time :

5 Send: blob: audio/raw, 45158 (+- e)

When I choose a language for the third time :

5 Send: blob: audio/raw, 67738

ppp

NadaAzzzam commented 3 years ago

Can you tell me how you fixed it cause i have the same error ?

NadaAzzzam commented 3 years ago

Hello, i have the same problem when start any language like (English)... result return with values and if i switched to any other language result returned empty string ... so if i want the application work again i should refresh it :( and that is very bad behavior to do... any suggestion please to fix this problem Screenshot from 2021-09-27 13-51-39

nshmyrev commented 3 years ago

so if i want the application work again i should refresh it :( and that is very bad behavior to do... any suggestion please to fix this problem

Given it is problem with your code it is hard to advise without looking on the source you have.

rmmal commented 3 years ago

No the problem is not in the code, After alot of invistigation here it is the summary: For the first time we run it, it send 1024 array in 500 sec interval,which results in a buffer size 22K ( a record request every 25 ms) so I send the ~22k buffer to the server and responds fine.

When I stop it and start it again or switch to another language here it is what I got: it sends 1024 array in 500 sec interval which results in a buffer size 44k ( a record request is sent every ~10-12 ms)

So the web recorder records an 1024 array every ms so in the first time the recorder records around 21 request in 500ms while in 2nd time it records 44 request in the same 500ms.

same happens if we go for 3rd or 4th.

I don't know what cause the problem yet but it appears that it is a recorder issue.

rmmal commented 3 years ago

So any advice please ?!

nshmyrev commented 3 years ago

No the problem is not in the code,

Again, the problem is in your code, our code doesn't have this feature at all

it sends 1024 array in 500 sec interval which results in a buffer size 44k ( a record request is sent every ~10-12 ms)

Most likely you subscribe to audio stream twice (like you call startUserMedia several times) so you get each audio chunk many times. You shouldn't do that when you switch languages.

NadaAzzzam commented 3 years ago

It works very good when i call startUserMedia once... thank you very much