cmusphinx / pocketsphinx

A small speech recognizer
Other
3.96k stars 720 forks source link

The README needs a hello_world example #275

Open petercwallis opened 2 years ago

petercwallis commented 2 years ago

The instructions at: https://cmusphinx.github.io/wiki/tutorialpocketsphinx are now past their use by date. the README.md is fine on linux but for those of us who know what a lib file is could we have a hello_ps.c please. Using c again reminds me why we all switched to java back in the dark ages...

dhdaines commented 2 years ago

Hi! Thanks for pointing this out, as pocketsphinx_continuous.exe is, quite clearly, gone. And it was never useful for building applications in the first place.

I will fix this documentation as soon as possible, for now it will just be removed to avoid further confusion :)

dhdaines commented 2 years ago

In any case the preferred way to use the library will be through Python. Java is just as bad as C in my opinion ;-)

petercwallis commented 2 years ago

The python way is certainly the way things are going but having the c version of an API means us java people can write the wrapper fairly easily using JNI and the lib.so and get the speed advantages of C. The useful bits for key phrase spotting from my perspective are a recogniser.run() method, and then a "call-back" mechanism (listeners in java) that is registered with the object(?) doing the running. Another call-back function for silence (non speech) would be good, and, ultimately a "soundex https://en.wikipedia.org/wiki/Soundex" tokenizer for out of vocabulary? - I tried to convince the kaldi people of this but to no effect. Confusing "dog" with "god" is forgivable and needing clarification (or pragmatic disambiguation); confusing "dog" with "bus" is not sensible to us humans.

On Tue, 9 Aug 2022 at 12:34, David Huggins-Daines @.***> wrote:

In any case the preferred way to use the library will be through Python. Java is just as bad as C in my opinion ;-)

— Reply to this email directly, view it on GitHub https://github.com/cmusphinx/pocketsphinx/issues/275#issuecomment-1209263887, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALY3LWITUATVU42TJQEXPJTVYI63PANCNFSM555XCX3A . You are receiving this because you authored the thread.Message ID: @.***>

dhdaines commented 2 years ago

Ah, good to know. The C API certainly won't go away... my plan is to integrate the WebRTC VAD code since it's the standard and its licence is compatible. The problem is that pocketsphinx_continuous existed as example code which never really worked well, but worked enough that people tried to build things with it, and then instead of doing live ASR correctly, it was decided to just keep hacking on the existing toy code.

Coqui has a lot of good examples of, in my opinion, the right way to do streaming ASR: https://github.com/coqui-ai/STT-examples.

For Java is it preferable to use SWIG or just JNI directly? I removed the SWIG code because with SWIG it was too difficult to make a good Python API, and other languages like Ruby weren't actually using it. Originally the SWIG wrapper was just there to support Java on Android. I certainly won't support anything Java as I'm already spending too much of my time on PocketSphinx which I consider to be obsolete in general...

Another long-standing problem is that the API isn't really designed correctly for callbacks. This is one of the reasons why I removed the audio code, as it was based around the thoroughly obsolete assumption that one gets audio by opening /dev/audio and doing blocking read() calls on it.

petercwallis commented 2 years ago

Took a quick look; Coqui looks good to me (Java bit is deprecated). Java tried (and failed) to standardize how to connect to a mic. Today we seem to do it per OS, and just have our application check for the OS and grab the right executable (CMake on this linux box). The key I think is to have the API for C on each os be the same. The API may not want to be the same for each application programming language (my son is using PureData as I type. Ouch.) but can I suggest that for python..C# java etc the API exposes the C API, and then extends it in a python/C#/Java kind of way, the key being to maintain and expose the C API. My favourite example of this is the pigpio package http://abyz.me.uk/rpi/pigpio/ for the pis. Joan looks after the C interface; others port the C to their preferred language. Documenting the C interface is key however ...if I could figure out how doxygen is meant to work :-/ WebRTC looks heavy weight to me "firebase" "ICE cadidates" and "rooms" all on the opening page of the intro. At the C level, what about registering a listener and then writing byte arrays to pocketSphinx_continuous? Leave it to us to figure out how to get a microphone to produce raw bytes in the code and writing them. A good feature of packages I have used is when they say "implement X. Instructions for this are available at Y. You can test your X by using this code... When that is working, connect your X to our Z by doing this ..."

JNI is my preferred way of working - it is plodding, not exciting, and strict, but it never breaks. You have to do things in the right order (write the java first), and get the .so file in the right place, and the classpath needs to have the wrapper. Done. Both of those can be hard but when it doesn't work it usually boils down to one of those classic issues.

On Tue, 9 Aug 2022 at 15:22, David Huggins-Daines @.***> wrote:

Ah, good to know. The C API certainly won't go away... my plan is to integrate the WebRTC VAD code since it's the standard and its licence is compatible. The problem is that pocketsphinx_continuous existed as example code which never really worked well, but worked enough that people tried to build things with it, and then instead of doing live ASR correctly, it was decided to just keep hacking on the existing toy code.

Coqui has a lot of good examples of, in my opinion, the right way to do streaming ASR: https://github.com/coqui-ai/STT-examples.

For Java is it preferable to use SWIG or just JNI directly? I removed the SWIG code because SWIG it was too difficult to make a good Python API with it, and other languages like Ruby weren't actually using it. Originally the SWIG wrapper was just there to support Java on Android. I certainly won't support anything Java as I'm already spending too much of my time on

Another long-standing problem is that the API isn't really designed correctly for callbacks. This is one of the reasons why I removed the audio code, as it was based around the thoroughly obsolete assumption that one gets audio by opening /dev/audio and doing blocking read() calls on it.

— Reply to this email directly, view it on GitHub https://github.com/cmusphinx/pocketsphinx/issues/275#issuecomment-1209448375, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALY3LWIHX6GBALPHYWQRXZTVYJSR5ANCNFSM555XCX3A . You are receiving this because you authored the thread.Message ID: @.***>

dhdaines commented 2 years ago

Actually now that I think of it the preferred option for the microphone on Unix and possibly also Windows is just be to popen() sox, as it is nearly always there, usually works, and can do various other things too.

jsalsman commented 2 years ago

David, that is a great idea, as people can modify sox as needed, or replace it with a similarly behaving executable.

If I understood you to say you plan to work on streaming with WebRTC, this has been the best option for years: https://www.npmjs.com/package/audio-recorder-polyfill

Best regards, Jim

On Tue, Aug 9, 2022, 1:23 PM David Huggins-Daines @.***> wrote:

Actually now that I think of it the preferred option for the microphone on Unix and possibly also Windows is just be to popen() sox, as it is nearly always there, usually works, and can do various other things too.

— Reply to this email directly, view it on GitHub https://github.com/cmusphinx/pocketsphinx/issues/275#issuecomment-1209836375, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ4RVHYCZ2VERFJ4GC5MOLVYK43BANCNFSM555XCX3A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

smbika007 commented 2 years ago

Vis a vis instructions, an example on how to run pocketsphinx.exe in "live" mode (presumably a microphone though I have no idea why the word microphone doesn't seem to appear anywhere in the code or documentation) would be useful including the command line parameters necessary to specify the lm and hmm...

The huge number of command line switches are rather daunting too. The bare minimum (language model and ancillary files) would be helpful.

Thanks

dhdaines commented 2 years ago

Most of the command line switches are not useful to you, and I think this is mentioned in the documentation, but I will mention it quite a lot louder :-)

Microphone input is not an easy thing, and a lot of trouble came from giving people the impression that it was. The Python module makes everything quite simple in any case:

from pocketsphinx import LiveSpeech
for phrase in LiveSpeech():
    print(phrase)
dhdaines commented 2 years ago

And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows?

Please let me know if this is actually a useful thing. I suspect it isn't.

smbika007 commented 2 years ago

And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows?

Please let me know if this is actually a useful thing. I suspect it isn't.

Well, you already know my opinion :-) although i might be the only one on the planet who does...LOL

Cheers!

dhdaines commented 2 years ago

Actually you're not the only one! But what you need, if I'm not mistaken, is what pocketsphinx_continuous was originally intended to be: example code which you can incorporate into your application.

It seems that sox doesn't do microphone input on Windows, either. PortAudio is a pretty good solution, and actually quite simple to implement. You can either include portaudio_static.lib and portaudio.h into your project directly, or you can "install" it somewhere, then set the CMAKE_PREFIX_PATH environment variable for your CMake build to point to that location. I'll publish some instructions on https://cmusphinx.github.io/ shortly. Perhaps we can add it as a git submodule (it isn't very big)

The original ad_win32.c code can also be used - it uses the oldest and most awful of the many awful (they are all awful) Windows audio APIs but doesn't require any external dependencies. I'll put together an example of it as well.

dhdaines commented 2 years ago

The example using PortAudio can be seen here: https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_portaudio.c

smbika007 commented 2 years ago

Actually you're not the only one! But what you need, if I'm not mistaken, is what pocketsphinx_continuous was originally intended to be: example code which you can incorporate into your application.

It seems that sox doesn't do microphone input on Windows, either. PortAudio is a pretty good solution, and actually quite simple to implement. You can either include portaudio_static.lib and portaudio.h into your project directly, or you can "install" it somewhere, then set the CMAKE_PREFIX_PATH environment variable for your CMake build to point to that location. I'll publish some instructions on https://cmusphinx.github.io/ shortly. Perhaps we can add it as a git submodule (it isn't very big)

The original ad_win32.c code can also be used - it uses the oldest and most awful of the many awful (they are all awful) Windows audio APIs but doesn't require any external dependencies. I'll put together an example of it as well.

Thanks again! I will check out portaudio and see if I can use that instead. The phrase "quite simple to implement" is a very nice thing to see 👍 ! And I look forward to the example for ad_win32.c although it's probably what I use now. And, yes, pocketsphinx_continuous is where I got the guts of my code for our app...

dhdaines commented 2 years ago

The ad_win32.c code actually has a number of problems and can be simplified for the new live speech API... particularly if it doesn't have to be fit into an existing framework. This is one of the reasons I removed the libsphinxad library, PortAudio or OpenAL do a better job of being a cross-platform library, so if you are targeting a particular platform it's probably better to go straight to the platform's API.

petercwallis commented 2 years ago

It turns out sox is not as platform independent as one would wish, even on nux. David we will need, on the font page, a link to setting up a microphone on a list of platforms. I can contribute the raspberry pi code and instructions. - shared interface for your code of course.

On Thu, 20 Oct 2022, 21:57 David Huggins-Daines, @.***> wrote:

The ad_win32.c code actually has a number of problems and can be simplified for the new live speech API... particularly if it doesn't have to be fit into an existing framework. This is one of the reasons I removed the libsphinxad library, PortAudio or OpenAL do a better job of being a cross-platform library, so if you are targeting a particular platform it's probably better to go straight to the platform's API.

— Reply to this email directly, view it on GitHub https://github.com/cmusphinx/pocketsphinx/issues/275#issuecomment-1286141303, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALY3LWLRT7TLIMQMSNETQSDWEGW4NANCNFSM555XCX3A . You are receiving this because you authored the thread.Message ID: @.***>

dhdaines commented 2 years ago

There are now examples for portaudio, pulseaudio, and Win32 wave input, see #319

I will however leave this issue open as we can always use more examples!