cmusphinx / pocketsphinx

A small speech recognizer
Other
3.93k stars 717 forks source link

Provide example of live recognition with microphone on various platforms #316

Open smbika007 opened 2 years ago

smbika007 commented 2 years ago

Hi.

I need to use pocketsphinx with a microphone only. I was able to do so in the 5prealpha version found on SourceForge with the pocketsphinx_continuous program. It has since been "retired" (Blade Runner style, it seems) and I have not found any real replacement yet in the code on Github (this code).

Does this pocketsphinx (5.0.0) support the use of a microphone?

Thanks, Sean

dhdaines commented 2 years ago

No, PocketSphinx 5.0.0 command-line and C API does not support the use of a microphone. See the rationale here: https://cmusphinx.github.io/2022/08/pocketsphinx-continuous/

PocketSphinx Python API does support microphone input. See documentation here: https://pocketsphinx.readthedocs.io/en/latest/

smbika007 commented 2 years ago

David, Thanks for the reply. I may have to stick with 5prealpha then. I appreciate the rationale page but I have to say, all I needed to do to make use of it the way I needed was to essentially copy the code out of pocketsphinx_continuous and graft it into my program. It worked as near to perfectly as one could expect and was the ideal choice for my company's application needs. I am not allowed to use python for this because our MO is to not use scripting languages in our active environment. They are generally slower and we need lightning fast turnaround. Our use case for it was strictly microphone access against a very small and specific grammar which limited ambiguity in a verbal commanding situation. I've found that pocketsphinx was not very good at general dictation even with a large vocabulary.

Ah, well.

Thanks, Sean

dhdaines commented 2 years ago

Hi Sean,

Thanks for the detailed reply! The issue is mainly that I very much do not want PocketSphinx to be in the business of interfacing with the microphone, because this creates a lot of maintainability and portability issues. I'm actually a bit surprised that the pocketsphinx_continuous code worked so well for you :)

Because I think there are at least a few people in your specific situation, I will provide an example of using PortAudio streams to do live recognition. I'm not enthusiastic about the idea of actually adding PortAudio as a dependency, and I think its API is rather unpleasant, but it seems like the least-hassle solution to the removal of pocketsphinx_continuous.

And yes, PocketSphinx is not to be used for general dictation, it is about 30 years out of date on that front. In fact, I am not convinced it should be used for anything, but I felt it needed to be cleaned up and the build system fixed, so...

dhdaines commented 2 years ago

(link to PortAudio documentation: http://files.portaudio.com/docs/v19-doxydocs/tutorial_start.html)

Also I have reopened this issue and changed its name!

smbika007 commented 2 years ago

My thanks, again! I will consider PortAudio as a possible mitigation to this. FTR, though, I've found the pocketsphinx_continuous code worked exceedingly well on all of the Windows 10 platforms and on Ubuntu in a VM which used the Windows box's native audio features. Could be I just got lucky ;-)

dhdaines commented 2 years ago

Hmm! Perhaps I can just pull out the old audio code and put it in the example then... mainly the issue is not wanting it to be in the library itself.

dhdaines commented 2 years ago

For PortAudio, it's specifically the "Blocking I/O" calls that are needed, the callback-based API is totally unsuitable for doing ASR:

http://portaudio.com/docs/v19-doxydocs/blocking_read_write.html

smbika007 commented 2 years ago

Hmm! Perhaps I can just pull out the old audio code and put it in the example then... mainly the issue is not wanting it to be in the library itself.

Putting it in the examples is fine by me. The use cases for sphinx should include it for the purposes of verbal commanding which it seems to do quite well. The java versions of sphinx all have it and indeed my first experience with it in our domain was the Java version. It worked fine too but the reason we moved to the C version was because the grammar compiler they used was too strict and when I introduced a grammar that include a LOT of variants, the compiler choked. I switch to the simpler version which is a single perl script and that was all I needed to add anything I wanted in free style.

It can easily be caveated as legacy code which some oddballs like me found useful...LOL

Don't write sphinx off as outdated just yet. I've found that if it still works to ones satisfaction and can be maintained easily, it's still a useful member of society ;-)

dhdaines commented 2 years ago

Good to know! The grammar support could stand to be improved - there's a bit of a performance regression in 5.0.0 because some optimizations that were being done when compiling JSGF to FSG resulted in incorrect grammars. I just created an issue for this https://github.com/cmusphinx/pocketsphinx/issues/317

And of course PocketSphinx is actually quite useful for alignment as well.

dhdaines commented 2 years ago

Working on this here: https://github.com/cmusphinx/pocketsphinx/pull/319

The PortAudio example seems to work well though I haven't yet tried it on Windows - the CMake code to detect it almost certainly won't work there, I'll check that soon.

dhdaines commented 2 years ago

The Win32 example (https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_win32.c) ought to work at least as well as the 5prealpha code, which is to say, maybe not all that well at all. The microphone on my Windows laptop seems very noisy, so the endpointer gives a lot of false positives for the first 30 seconds or so.