alphacep / vosk-android-demo

Offline speech recognition for Android with Vosk library.
Apache License 2.0
752 stars 202 forks source link

Have some questions related the input file format #87

Closed meenakshi17media closed 4 years ago

meenakshi17media commented 4 years ago

My questions are :

  1. Is it mandatory to use .wav file as an input ?
  2. can we use .mp3 or .flv audio files ?
  3. is there any time duration of audio file defined for 1 input ?
  4. is there any pre-processing steps for input audio file.

I passed a random 1 minute .mp3 audio file as an input but getting wrong results.

nshmyrev commented 4 years ago

Is it mandatory to use .wav file as an input ?

Yes, and the wav file must be in specific format.

can we use .mp3 or .flv audio files ?

You have to convert first with ffmpeg for example

is there any time duration of audio file defined for 1 input ?

No

is there any pre-processing steps for input audio file.

No

meenakshi17media commented 4 years ago

ok, if i directly convert .mp3 in .wav file by using online tools, it will not work right ?

nshmyrev commented 4 years ago

ok, if i directly convert .mp3 in .wav file by using online tools, it will not work right ?

Yes

meenakshi17media commented 4 years ago

i am not able to use

vosk-model-small-en-us-0.3 36M TBD Lightweight wideband model for Android and RPi

this model in android giving error : Result %s elapsed %d milliseconds

I am giving .wav file converted by ffmpeg audio file

nshmyrev commented 4 years ago

Make sure audio file has proper format - 16 khz 16 bit mono signed pcm.

meenakshi17media commented 4 years ago

In small model for android you don't have model.conf file, you mentioned in your document that :

conf/model.conf - provide default decoding beams and silence phones. you have to create this file yourself, it is not present in kaldi model

can you help me with this, so i can build model.conf file for android speech to text feature. And the solution you provided for STT its based pre-defined grammar or keyword or it will work without sending grammer also.

can you help me with this, i am an android person, don't have that much knowledge about kaldi and all models things.

Thanks in Advance