alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.37k stars 1.04k forks source link

Exceptions: java.lang.OutOfMemoryError: Java heap space #1479

Closed 5exceptions-rakeshdiwan closed 6 months ago

5exceptions-rakeshdiwan commented 6 months ago

Unable to integrate "vosk-model-en-us-0.22" language model with assets because of it's large size.

I'm working on one of the requirement of Android application and trying to integrate the large model into application package, I've encountered accuracy issues with small model so I switched to large ones. Please find details below about what I've tried so far:

Please assits me to get more proper way to integrate VOSK into android or flutter to get the more accuracy at runtime.

image

nshmyrev commented 6 months ago

Big models are created for big servers, not for mobile phones.

To get suggestions on accuracy improvement you need to share sample audio data.

5exceptions-rakeshdiwan commented 6 months ago

Can I share audio and other details here or we can use private email thread for that? Please assist me for this.

nshmyrev commented 6 months ago

You can share here.

5exceptions-rakeshdiwan commented 6 months ago

@nshmyrev Due to maintain confidentiality of work I’m not able to attach audio file here in this chat. Instead of it I’m sharing a drive link for all the required files in a folder. Please request for access on the same or give me email for access.

Folder contains:

  1. Sample audio file.
  2. Idea and issue brief video.

Links

  1. Drive Folder: https://drive.google.com/drive/folders/11jXpjNbqCU0tvk7kUrX47fsqHXkLvP9m?usp=share_link
  2. List of words and phrases want to target: https://docs.google.com/spreadsheets/d/1aSjcTkF58qBJiLMCteIh_n_nCj5lz200ayIDY8ZlKZg/edit?usp=sharing

Thanks in advance.

nshmyrev commented 6 months ago

It doesn't allow me to download, says I need to request access.

5exceptions-rakeshdiwan commented 6 months ago

It doesn't allow me to download, says I need to request access.

The link is open can you please try now?

nshmyrev commented 6 months ago

Well, you need better microphone. The current one is awful and cuts audio at 3khz. It has nothing about model size.

5exceptions-rakeshdiwan commented 6 months ago

It has nothing about model size.

Can I have any reference audio file?

5exceptions-rakeshdiwan commented 6 months ago

I've managed to record the phrases again please check and let me know if the cuts and pause time are correct. https://drive.google.com/drive/folders/1xtU0eSp8uf0wEfXHwRyIOw5pvxiCu0H8?usp=sharing

nshmyrev commented 6 months ago

Now the audio quality is much better

5exceptions-rakeshdiwan commented 6 months ago

Great, But I'm still facing the accuracy issue

5exceptions-rakeshdiwan commented 6 months ago

@nshmyrev! Can you please advise us on accuracy and how can we resolve this?

gvoll commented 6 months ago

Happy New Year @nshmyrev! I'm the 5exceptions client :) We'd greatly appreciate your guidance on how to improve the accuracy of these commands to support the voice activation-based app we're working on. Please let us know if there's a different preferred way to provide these audio files or anything else that will help move this to resolution. Ideally, this will be an approach that we can repeat when adding commands going forward.
Thank you!

nshmyrev commented 6 months ago

@gvoll you can bias model to specific commands, see

https://alphacephei.com/vosk/lm

https://github.com/alphacep/vosk-api/blob/master/python/example/colab/vosk-adaptation.ipynb