cmusphinx / sphinx4

Pure Java speech recognition library
cmusphinx.sourceforge.net
Other
1.4k stars 586 forks source link

Allow reusing acoustic model objects in multiple Recognizer objects #64

Closed chetan-prime closed 7 years ago

chetan-prime commented 7 years ago

I am using a multiple threads each having a single StreamSpeechRecognizer object . All threads use the same Configuration so same acoustic model. Using 5 threads does duplication of resources on a massive scale as The 70k acoustic models objects each use up 4+ GB of RAM . So with 5 threads I'm at 20 GB.

I wanted to confirm if it's thread-safe to reuse the same acoustic model objects for multiple objects of StreamSpeechRecognizer on different threads at the same time. I guess achieving this isn't too difficult if we use a HashMap to cache models by the filepath. However before I do this can the developers confirm if the acoustic model objects like HMMPool are safe to share among multiple threads. I plan to use only one thread to load the first object and then reuse the same object among other threads. This will work fine only if sphinx5 itself uses the acoustic model objects in a thread-safe manner.

This will be a great feature to have as it can reduce RAM usage by upto 4GB per thread re-using the same acoustic model that's already loaded.

nshmyrev commented 7 years ago

If you care about speed and RAM it is better to check http://github.com/kaldi-asr/kaldi, it is much much more accurate and able to run at 0.5xRT with 2Gb of memory.

chetan-prime commented 7 years ago

Thanks for confirming this isn't a priority. Kaldi doesn't look like an option for me as it doesn't support as many languages as Sphinx. I guess I'll use pocketsphinx as thats c++ RAM isn't a problem has my server has 32 GB. But a lot of time is wasted loading the same acoustic models for each thread which I could use instead to load acoustic models of a different language. I'm trying to modify the context loader to cache the acoustic models using a static HashMap and seeing how that goes.

nshmyrev commented 7 years ago

Kaldi should be a focus for you since it is much more accurate and modern technology. Language supported by pocketsphinx could be easily added in Kaldi too.

Static hashmap does not make sense, its specific data for every decoder, you can't improve much there, it is the issue with the algorithm.

chetan-prime commented 7 years ago

Many thanks , have had very good results after following your advice on this issue