alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.23k stars 1.13k forks source link

Use vosk_recognizer_accept_waveform_f interface but got bad result #1417

Open liuweie opened 1 year ago

liuweie commented 1 year ago

Hi , I used C++ to call your compiled library in windows ,when I use the vosk_recognizer_accept_waveform interface(accept const char data), the recognize result is perfect, but when I use vosk_recognizer_accept_waveform_f (accept const float data), the result is not very accurate. So I wandering if there is a problem with how I use the vosk_recognizer_accept_waveform_f ??

here is my code : ` int vosk(std::string wavFile, std::string modelPath) {

  std::ifstream wavin(wavFile, std::ios::binary);
  char buf[48000];
  int final, nread;

  VoskModel* model = vosk_model_new(modelPath.data());
  VoskRecognizer* recognizer = vosk_recognizer_new(model, 16000);

  //wavin.seekg(44, std::ios::beg);
  while (!wavin.eof()) 
  {
      wavin.read(buf, sizeof(buf));
      nread = wavin.gcount();
      int flen = nread / 2;
      float floatBuf[48000];
      for (int i = 0; i < nread/2; i++) 
      {
          floatBuf[i] = static_cast<float>(reinterpret_cast<const int16_t*>(buf)[i]);
      }

      final = vosk_recognizer_accept_waveform_f(recognizer, floatBuf, nread/sizeof(float));
      std::cout << "final is " << final << std::endl;
      if (final) 
      {
          std::cout << coutCH(vosk_recognizer_result(recognizer)) << std::endl;;
      }
      else 
      {
          std::cout << coutCH(vosk_recognizer_partial_result(recognizer)) << std::endl;
      }
  }

  final = vosk_recognizer_accept_waveform(recognizer, buf, nread);
  vosk_recognizer_free(recognizer);
  vosk_model_free(model);
  return 0;

} `

nshmyrev commented 1 year ago
 final = vosk_recognizer_accept_waveform_f(recognizer, floatBuf, nread/sizeof(float));

nread/sizeof(float) should be wrong. You still have nread/2 samples even if you convert to float

lostromb commented 1 year ago

I saw this also in the C# bindings when calling VoskRecognizer.AcceptWaveform(floatBuffer, numSamplesPerChannel) when floatBuffer contains float32 samples ranging from [-1.0, 1.0]. Is the function expecting the samples to be scaled differently (e.g. [-32767.0, 32767.0] ?

nshmyrev commented 1 year ago

Is the function expecting the samples to be scaled differently (e.g. [-32767.0, 32767.0] ?

Yes, they have to be in 32768 range.