alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.7k stars 1.08k forks source link

Support for domain-specific grammars #55

Open dtreskunov opened 4 years ago

dtreskunov commented 4 years ago

It would be very convenient to be able to describe the space of possible outputs. This would be very useful for example in the domain of home automation, where the speaker is likely to say sentences such as, turn {on,off} the {hallway,kitchen} lights. Right now, using the full language model from here, accuracy isn't very good. Intuitively, by limiting the grammar, accuracy could be improved.

It would be awesome if this could be done inside the library rather than offline using Kaldi command-line utilities, as described on the Updating the Language Model page.

I was wondering if that's something that you have planned for, given the addition of this bit of code in kaldi_recognizer.cc which serves to limit the set of words from the full set included in the language model to just those in the string grammar:

g_fst_.AddState();
g_fst_.SetStart(0);
g_fst_.AddState();
g_fst_.SetFinal(1, fst::TropicalWeight::One());
g_fst_.AddArc(1, StdArc(0, 0, fst::TropicalWeight::One(), 0));

// Create simple word loop FST
std::stringstream ss(grammar);
std::string token;

while (std::getline(ss, token, ' ')) {
    int32 id = model_.word_syms_->Find(token);
    g_fst_.AddArc(0, StdArc(id, id, fst::TropicalWeight::One(), 1));
}
ArcSort(&g_fst_, ILabelCompare<StdArc>());

decode_fst_ = LookaheadComposeFst(*model_.hcl_fst_, g_fst_, model_.disambig_);

Would it be possible to read the grammar from a JSGF file or something like Rhasspy's sentences.ini (documented here) and construct the appropriate graph?

nshmyrev commented 4 years ago

Grammars are not really natural way of doing things, users never follow them. Grammars just add additional code complexity. We might adopt something like Google's phrase hints in the future.

dtreskunov commented 4 years ago

This seems exactly what the Rhasspy project is doing. They're combining the language model from user-supplied sentences.ini together with the general language model with some weighting factor:

image

RafNie commented 4 years ago

Hi

I'm struggling with similar problem. I'm trying improve accuracy of my simple command and control system. I've defined set of commands:

lighting on
lighting off
lighting bright increase
lighting bright decrease
...
lighting bright set zero
lighting bright set one
lighting bright set two
...

In first attempt I prepared a set of word from my command list and I passed it as the grammar in to KaldiRecognizer(Model &model, float sample_frequency, char const *grammar). It much improved time of sample processing and also accuracy. But when I use microphone the accuracy is not the best. I assume that because every word has the same probability.

So, I'm trying improve accuracy in other way. I've prepared new FST based on your adaptation.md document. This produces 3-gram model.

 farcompilestrings --fst_type=compact --symbols=/home/kaldi_host/words.txt --keep_symbols /home/kaldi_host/text.txt | ngramcount | ngrammake | fstconvert --fst_type=ngram > /home/kaldi_host/Gr.fst

But now the Kaldi recognises eaven a noise as "lighting" word. A ngramprint shows the lighting word has much higher probability than other words:

...
lighting        0.2160804
lighting hotter 0.00258398
lighting hotter </s>    0.04
...
lighting off    0.00258398
lighting off </s>       0.04
lighting on     0.00258398
lighting on </s>        0.04
...

bright set eight        0.003636363
bright set four 0.003636363
bright set five 0.003636363
bright decrease 0.008547007
...
</s>    0.2160804
<s>     0.2160804
<s> lighting    0.999
...

I also tryed making 6-gram model ( ngramcount --order=6 ) to cover longest commands with start and end markers. But result was similar, the kaldi returns "lighting" continuously when I use microphone. Do you know any way to reduce the likelihood of the word "lighting" in the compiled FST?

I wonder if I can manually prepare the FST description and compile it by fstcompile. This is a lot of work to prepare proper model manually, so I wonder if there is a chance that it will improve accuracy. Or maybe there is other approach?

nshmyrev commented 4 years ago

@RafNie thats kinda hard to do it right now, you need to add unk bigram loop into the lexicon with make_unk_lm.sh and later add into the grammar and tune the probabilities. It will not work the other way.

I might work on it some time later, but it will take some time.

RafNie commented 4 years ago

@nshmyrev thanks for suggestion. And I think I've fixed the "lighting" recognition problem by adding [unk] to the end of my command list.

<s> lighting    0.9535909
...
<s> lighting bright set five </s>        0.06060606
<s> [unk]       0.04540909
<s> [unk] </s>  0.999

I checked what is syntax of unk symbol in words.txt.

Now I will check how it will improve accuracy of command resolving.

It seems that such way is not best for creating FST for command recognition. At example In the text.txt I defined separate commands for setting parameters:

lighting bright set one
lighting bright set two
...

It cause problem because bigram lighting on, witch is used only once in the text.txt file, has much lower probability than at example bright set. So I think using farcompilestrings | ngramcount | ngrammake to compiling of the command set is not best approach. I must define such grammar in other way. But what will be best way?

RafNie commented 4 years ago

I have got some progress in improving of accuracy. I've added smoothing of probability distribution to ngrmmake ngrammake --method=kneser_ney and set ngramcount to 4-grams. And also I multiplied the occurrences of individual simple commands on the list to highlight two-words commands. It works not bad, but could be better.

RafNie commented 4 years ago

I found better approach to define command grammar. I used sphinx_jsgf2fsg to convert grammar defined in JSFG format

First define grammar in jsgf file like this example

#JSGF V1.0;
/**
 * JSGF Grammar for Hello World example
 */
grammar lamp_commands;

public <command> = <key> <commands>;
<key> = lighting;
<commands> = ( <simple_commands> | <settable_commands_list>);
<simple_commands> = ( on | off | hotter | colder );
<settable_command> = ( bright | color );
<settable_commands_list> = <settable_command> set <numbers>;
<numbers> = ( zero | one | two | three | four | five | six | seven | eight | nine | ten );

After that convert it in to fsm sphinx_jsgf2fsg -jsgf grammar.jsgf -fsm grammar.fsm

and add unk loop in to fsm file echo 0 0 [unk] -0.000000 >> grammar.fsm

After that compile in to fst fstcompile --acceptor --isymbols=words.txt --osymbols=words.txt --keep_isymbols=true --keep_osymbols=true grammar.fsm | fstdeterminize | fstminimize | fstrmepsilon | fstarcsort > Gr.fst

In my opinion this grammar transducer works better than composed by n-gram counting.

nshmyrev commented 4 years ago

@RafNie very good! You'd better do a formal test with a small database of 100 records. That will give you an opportunity to tune the unk scale.

RafNie commented 4 years ago

@nshmyrev Actually I'm replacing Gr.fst in yours en model. I don't have any audio data for the test.

But I suppose tuning unk probability is not necessary in that case. A sphinx_jsgf2fsg produces transitions with equal weights, like this:

0 1 lighting -0.000000
1 2 bright -0.000000
1 3 on -0.000000
1 3 off -0.000000

So finally compiled Gr.fst is without probability:

0       1       lighting        lighting
1       3       hotter  hotter
1       3       colder  colder
1       4       color   color
1       3       off     off

In opposite to FST produced by n-cram counting:

0       4       [unk]   [unk]   3.60877895
0       4       on      on      3.60877895
0       4       off     off     3.60877895
0       3       lighting        lighting        4.66192913
0       2       [unk]   [unk]   1.02615702
0       1       lighting        lighting        2.05053091
0       2.128232
dtreskunov commented 4 years ago

Awesome! I would like to integrate what sphinx_jsgf2fsg and fstcompile are doing directly into vosk-api's code (compile those tools as static libraries and link with them).

Thoughts?

nshmyrev commented 4 years ago

Its better to simply implement something like this https://github.com/kaldi-asr/kaldi/blob/e5a5a2869c0f91a5db1a9bb0d8ce06bffe82898d/egs/wsj/s5/utils/lang/make_phone_bigram_lang.sh#L67 in c++ with a set of phrases and add unk to the loop.

RafNie commented 4 years ago

For me it is enough as is. At most we can prepare bash script for update Gr.fst in existing model, or update the adaptation.md doc. I can help with this.

But I discovered problem. When I use test_microphone.py script the decoder returns last correct recognised sentence periodically with only noise as input. At example. I said "lighting on" and it was recognised with success, but after that the "lighting on" is returned periodically when I said nothing. Even despite empty "partial" returns.

{
  "result" : [{
      "conf" : 1.000000,
      "end" : 242.070000,
      "start" : 241.920000,
      "word" : "lighting"
    }, {
      "conf" : 0.725843,
      "end" : 242.130000,
      "start" : 242.070000,
      "word" : "on"
    }],
  "text" : "lighting on"
}
{
  "partial" : ""
}
...
...
{
  "partial" : ""
}

{
  "result" : [{
      "conf" : 1.000000,
      "end" : 247.110000,
      "start" : 246.960000,
      "word" : "lighting"
    }, {
      "conf" : 0.810554,
      "end" : 247.170000,
      "start" : 247.110000,
      "word" : "on"
    }],
  "text" : "lighting on"
}

It is probably over fit problem. So maybe an <unk> should be stronger or should be added also as the loop to others FST states. Any suggestions?

RafNie commented 4 years ago

The solution is add unk transition from initial state to final state instead of adding unk loop to the initial state. I done it in the JSGF file like this:

public <command> = ( <key> <commands> ) | <unk>;
<key> = lighting;
<commands> = ( <simple_commands> | <settable_commands_list>);
<simple_commands> = ( on | off | hotter | colder );
<settable_command> = ( bright | color );
<settable_commands_list> = <settable_command> set <numbers>;
<numbers> = ( zero | one | two | three | four | five | six | seven | eight | nine | ten );
<unk> = ( unk_arc );

then converted and replaced unk symbol:

sphinx_jsgf2fsg -jsgf grammar.jsgf -fsm grammar.fsm
sed -i 's/unk_arc/\[unk\]/g' grammar.fsm

and compiled like before.

jose-alvessh commented 4 years ago

Hi @RafNie,

I was having the same problems as you while using an adapted grammar created following the instructions found in adaption.md.

I have tried to use yours methodology, however when I import the Gr.fst model to the Android App i always get an abort crash from the inside of the kaldi folder. To make sure that the problem was not with my .jsgf file I have used yours but I still got the same error (FATAL -6 Abort inside the kaldi).

Can you tell me from where did you downloaded the sphinx_jsgf2fsg file? I have the last version from the sphixbase repository (https://github.com/cmusphinx/sphinxbase). Also, how is created the words.txt file? Neither you, neither the adaption.md file indicate how it should be created and what should be contained on it, and I still in doubt how I should create mine.

It would be amazing if you could help with this problem, since I want to improve the results of the recognizer but like you I don't have data to train a full algorithm.

Thanks in advance!

nshmyrev commented 4 years ago

I have tried to use yours methodology, however when I import the Gr.fst model to the Android App i always get an abort crash from the inside of the kaldi folder. To make sure that the problem was not with my .jsgf file I have used yours but I still got the same error (FATAL -6 Abort inside the kaldi).

You'd better test the model on desktop with python first to see everything is fine. There must be error messages if something went wrong both from python and in android logcat.

Also, how is created the words.txt file? Neither you, neither the adaption.md file indicate how it should be created and what should be contained on it, and I still in doubt how I should create mine.

words.txt is created with the following command from adaptation.md:

   fstsymbols --save_osymbols=words.txt Gr.fst > /dev/null
jose-alvessh commented 4 years ago

Thanks a lot for the quick response. I was doing a mistake when creating the words.txt file and therefore the results were not good at all. Now that I've changed it I have better results.

I will debug it in Python to see what is happening. Thanks!

rezame commented 4 years ago

Hi I think it'd better use grammar active decoding. https://kaldi-asr.org/doc/grammar.html and in https://github.com/daanzu/kaldi-active-grammar use this. In this code it can have different grammar active and diactive them.

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

Is it possible to pipe vosk with it?

nshmyrev commented 4 years ago

@rezame grammar decoding is kinda worse technology since it requires you to hold the full compiled graphs in memory. You can also switch grammars with vosk like I told above but there is no need to maintain them on storage. But you are welcome to use kaldi-active-grammar if you like it.

RafNie commented 4 years ago

Hi @jose-alvessh

Can you tell me from where did you downloaded the sphinx_jsgf2fsg file?

All necessary tools have a lot of dependencies. I did not want to spam my system so, I used the official Kaldi docker image. I installed openFST and openGrm in this image using the installation scripts present there. While a sphinx_jsgf2fsg tool is availabe in Debian in the sphinxbase-utils package. So I installed it using apt install sphinxbase-utils in docker img.

jose-alvessh commented 4 years ago

Sorry @RafNie, I think I did not express myself correctly. I'm able to create the Gr.fst file with the fsm file however when I replace it on the Android Demo App I get the following error when building the decoding graph:

V/KaldiDemo: Can't create decoding graph
A/libc: /usr/local/google/buildbot/src/android/ndk-release-r20/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:73: abort_message: assertion "terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError" failed
A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 14521 (AsyncTask #2), pid 13656 (org.kaldi.demo)

I'm trying to build the VoskApi to Python in order to debug it in order to check if I can get more detailed logs around this error.

jose-alvessh commented 4 years ago

Hi @RafNie, can you tell me the plarform where you used the Gr.fst generated with JSGF? I have built the Gr.fst file using JSGF but however it works well while used in Python, It does not work when replacing the Gr.fst on the Android Demo App. I get this error and the demo app crashes:

2020-05-11 15:18:09.342 29463-30461/org.kaldi.demo A/libc: /usr/local/google/buildbot/src/android/ndk-release-r20/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:73: abort_message: assertion "terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError" failed
2020-05-11 15:18:09.343 29463-30461/org.kaldi.demo A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 30461 (AsyncTask 2), pid 29463 (org.kaldi.demo)
2020-05-11 15:18:09.422 30464-30464/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2020-05-11 15:18:09.422 30464-30464/? A/DEBUG: Build fingerprint: 'samsung/gtaxlltexx/gtaxllte:8.1.0/M1AJQ/T585XXU4CRK5:user/release-keys'
2020-05-11 15:18:09.422 30464-30464/? A/DEBUG: Revision: '6'
2020-05-11 15:18:09.422 30464-30464/? A/DEBUG: ABI: 'arm'
2020-05-11 15:18:09.422 30464-30464/? A/DEBUG: pid: 29463, tid: 30461, name: AsyncTask 2  >>> org.kaldi.demo <<<
2020-05-11 15:18:09.422 30464-30464/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
2020-05-11 15:18:09.427 30464-30464/? A/DEBUG: Abort message: '/usr/local/google/buildbot/src/android/ndk-release-r20/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:73: abort_message: assertion "terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError" failed'
2020-05-11 15:18:09.427 30464-30464/? A/DEBUG:     r0 00000000  r1 000076fd  r2 00000006  r3 00000008
2020-05-11 15:18:09.427 30464-30464/? A/DEBUG:     r4 00007317  r5 000076fd  r6 bd100cc4  r7 0000010c
2020-05-11 15:18:09.427 30464-30464/? A/DEBUG:     r8 00000000  r9 db7f5000  sl 00000000  fp bd101324
2020-05-11 15:18:09.427 30464-30464/? A/DEBUG:     ip bd100d0c  sp bd100cb0  lr e6ac0e6f  pc e6aba528  cpsr 200d0030
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG: backtrace:
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #00 pc 0001a528  /system/lib/libc.so (abort+63)
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #01 pc 0001a9f9  /system/lib/libc.so (__assert2+20)
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #02 pc 00639331  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #03 pc 00639431  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #04 pc 006379d1  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #05 pc 0063737f  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #06 pc 00637347  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so (__cxa_throw+74)
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #07 pc 001fcfcd  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so (kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+72)
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #08 pc 001fcd19  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so (KaldiRecognizer::KaldiRecognizer(Model&, float)+148)
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #09 pc 001fc997  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/lib/arm/libkaldi_jni.so (Java_org_kaldi_voskJNI_new_1KaldiRecognizer+30)
2020-05-11 15:18:09.433 30464-30464/? A/DEBUG:     #10 pc 00009567  /data/app/org.kaldi.demo-r8HG7GACzIY6RADYmVBtmw==/oat/arm/base.odex (offset 0x9000)

I am using a ARMv8 system on Android 8.1.0. Can you help me on this matter? @nshmyrev @RafNie

nshmyrev commented 4 years ago

@jose-alvessh first of all check few lines above in logcat, it should contain real error.

Second, you might add words.txt to the android mdoel probably if you compile from jsgf.

RafNie commented 4 years ago

Hi @RafNie, can you tell me the plarform where you used the Gr.fst generated with JSGF? I have built the Gr.fst file using JSGF but however it works well while used in Python, It does not work when replacing the Gr.fst on the Android Demo App.

I tested it on arm6 (Raspberry Pi Zero) and on arm7 (NanoPi). Both tests with python API.

jose-alvessh commented 4 years ago

Hum, thanks @RafNie . @nshmyrev even adding the words.txt to the assets folder does not work. Above the Log of the error that I sent you, from the Kaldi Library I only get the this line: 2020-05-11 16:04:19.018 761-1749/org.kaldi.demo V/KaldiDemo: Can't create decoding graph

gormonn commented 4 years ago

Hi! I have a strange problem while converting JSGF to FSM. At first I thought that I incorrectly described the rules in JSGF, so I took the code from the post above:

#JSGF V1.0;
/**
 * JSGF Grammar for Hello World example
 */
grammar lamp_commands;

public <command> = ( <key> <commands> ) | <unk>;
<key> = lighting;
<commands> = ( <simple_commands> | <settable_commands_list>);
<simple_commands> = ( on | off | hotter | colder );
<settable_command> = ( bright | color );
<settable_commands_list> = <set_command> set <numbers>;
<numbers> = ( zero | one | two | three | four | five | six | seven | eight | nine | ten );
<unk> = ( unk_arc );

My os:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04 LTS
Release:        20.04
Codename:       focal

Installation sphinx_jsgf2fsg to my Ubuntu: sudo apt install sphinxbase-utils

Then trying to run:

$ sphinx_jsgf2fsg -jsgf grammar.jsgf -fsm grammar.fsm
Current configuration:
[NAME]          [DEFLT] [VALUE]
-compile        no      no
-fsg
-fsm                    grammar.fsm
-help           no      no
-jsgf                   grammar.jsgf
-symtab
-toprule

INFO: jsgf.c(705): Defined rule: <lamp_commands.g00000>
INFO: jsgf.c(705): Defined rule: PUBLIC <lamp_commands.command>
INFO: jsgf.c(705): Defined rule: <lamp_commands.key>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00003>
INFO: jsgf.c(705): Defined rule: <lamp_commands.commands>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00005>
INFO: jsgf.c(705): Defined rule: <lamp_commands.simple_commands>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00007>
INFO: jsgf.c(705): Defined rule: <lamp_commands.settable_command>
INFO: jsgf.c(705): Defined rule: <lamp_commands.settable_commands_list>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00010>
INFO: jsgf.c(705): Defined rule: <lamp_commands.numbers>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00012>
INFO: jsgf.c(705): Defined rule: <lamp_commands.unk>
INFO: main.c(116): No -toprule was given; grabbing the first public rule: '<lamp_commands.command>' of the grammar 'lamp_commands'.
ERROR: "jsgf.c", line 340: Undefined rule in RHS: <lamp_commands.set_command>
INFO: fsg_model.c(898): Writing FSM file 'grammar.fsm'

So, as a result, i have broken file:

0 1 unk_arc -0.000000
0 2 lighting -0.000000
0 3 <eps> -0.000000
3 0
gormonn commented 4 years ago

Кроме того, как создается файл words.txt? Ни вы, ни файл adaption.md не указывают, как он должен быть создан и что в нем должно содержаться, и я все еще сомневаюсь, как мне создать мой.

words.txt создается с помощью следующей команды с адаптацией.md:

   fstsymbols --save_osymbols=words.txt Gr.fst > /dev/null

How? We don't have Gr.fst before converting it from JSGF. Maybe use parameter -symtab words.txt: ? sphinx_jsgf2fsg -jsgf grammar.jsgf -fsm grammar.fsm -toprule [your_grammar_name].command -symtab words.txt

nshmyrev commented 4 years ago

How? We don't have Gr.fst before converting it from JSGF.

Gr.fst comes with the model package.

gormonn commented 4 years ago

How? We don't have Gr.fst before converting it from JSGF.

Gr.fst comes with the model package.

Hmm, really ... It seems that, I cannot find this file only in a specific model: vosk-model-ru-0.10

nshmyrev commented 4 years ago

It seems that, I cannot find this file only in a specific model:

Big models update described here https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/

RafNie commented 3 years ago

Hi! I have a strange problem while converting JSGF to FSM. At first I thought that I incorrectly described the rules in JSGF, so I took the code from the post above:

#JSGF V1.0;
/**
 * JSGF Grammar for Hello World example
 */
grammar lamp_commands;

public <command> = ( <key> <commands> ) | <unk>;
<key> = lighting;
<commands> = ( <simple_commands> | <settable_commands_list>);
<simple_commands> = ( on | off | hotter | colder );
<settable_command> = ( bright | color );
<settable_commands_list> = <set_command> set <numbers>;
<numbers> = ( zero | one | two | three | four | five | six | seven | eight | nine | ten );
<unk> = ( unk_arc );

My os:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04 LTS
Release:        20.04
Codename:       focal

Installation sphinx_jsgf2fsg to my Ubuntu: sudo apt install sphinxbase-utils

Then trying to run:

$ sphinx_jsgf2fsg -jsgf grammar.jsgf -fsm grammar.fsm
Current configuration:
[NAME]          [DEFLT] [VALUE]
-compile        no      no
-fsg
-fsm                    grammar.fsm
-help           no      no
-jsgf                   grammar.jsgf
-symtab
-toprule

INFO: jsgf.c(705): Defined rule: <lamp_commands.g00000>
INFO: jsgf.c(705): Defined rule: PUBLIC <lamp_commands.command>
INFO: jsgf.c(705): Defined rule: <lamp_commands.key>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00003>
INFO: jsgf.c(705): Defined rule: <lamp_commands.commands>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00005>
INFO: jsgf.c(705): Defined rule: <lamp_commands.simple_commands>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00007>
INFO: jsgf.c(705): Defined rule: <lamp_commands.settable_command>
INFO: jsgf.c(705): Defined rule: <lamp_commands.settable_commands_list>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00010>
INFO: jsgf.c(705): Defined rule: <lamp_commands.numbers>
INFO: jsgf.c(705): Defined rule: <lamp_commands.g00012>
INFO: jsgf.c(705): Defined rule: <lamp_commands.unk>
INFO: main.c(116): No -toprule was given; grabbing the first public rule: '<lamp_commands.command>' of the grammar 'lamp_commands'.
ERROR: "jsgf.c", line 340: Undefined rule in RHS: <lamp_commands.set_command>
INFO: fsg_model.c(898): Writing FSM file 'grammar.fsm'

So, as a result, i have broken file:

0 1 unk_arc -0.000000
0 2 lighting -0.000000
0 3 <eps> -0.000000
3 0

Hi @gormonn Probably is too late for answer to you, but the reason was my typo mistake. Should be <settable_commands_list> = <settable_command> set <numbers> ; instead of <settable_commands_list> = <set_command> set <numbers> ;

Lightning101 commented 2 years ago

Hi sorry I'm probably a bit late to this issue but I experienced similar issues when following the method mentioned above. Which ended in the Gr.fst being produced like so:

However I find that the false positives are creeping in due the fact that there is no way out once the initial keyword is said. In this case "command", resulting in not matter what is said next being detected going down the positive path.

When looking further into the issue I found this reference Issue ref which says to introduce a set of arbitrary paths within the good ones to allow for the detection of bad paths.

Which prompted me to write this code

#include <string>
#include <vector>
#include <iostream>

#include <fst/fstlib.h>

using namespace fst;

int main(int argc, char *argv[])
{
  // A vector FST is a general mutable FST

  std::vector<std::string> args(argv + 1, argv + argc);

  if (args.size() < 2)
  {
    std::cout << "Please input 2 arguments" << std::endl;
    std::cout << "exe.o *.fst <arc_path>" << std::endl;
    return 1;
  }

  StdVectorFst *graph = StdVectorFst::Read(args.at(0));
  const SymbolTable *isyms = graph->InputSymbols();
  const SymbolTable *osyms = graph->OutputSymbols();
  string arc_symbol = args.at(1);

  if (isyms->Find(args.at(1)) == kNoSymbol || osyms->Find(args.at(1)) == kNoSymbol)
  {
    std::cout << "Arc path must be included in symbols.txt and words.txt" << std::endl;
    return -1;
  }

  std::cout << graph->Start() + "\n";

  vector<int> finalStates;
  vector<int> nonFinalStates;

  for (StateIterator<StdVectorFst> siter(*graph); not siter.Done(); siter.Next())
  {
    StdIntersectFst::StateId s = siter.Value();
    std::cout << "state=" << s << ":" << std::endl;

    int arcCount = 0;
    for (ArcIterator<StdVectorFst> aiter(*graph, s); not aiter.Done(); aiter.Next())
    {
      arcCount++;
      const StdArc &arc = aiter.Value();
      std::cout << arc.ilabel << "/" << arc.olabel <<"/"<< arc.weight << "->" << arc.nextstate << "," << std::endl;
    }
    if (arcCount == 0)
      finalStates.push_back(s);
    else
      nonFinalStates.push_back(s);
    std::cout << std::endl;
  }

  for (auto i = finalStates.cbegin(); i != finalStates.cend(); ++i)
  {
    std::cout << *i << " ";
    for (auto j = nonFinalStates.cbegin(); j != nonFinalStates.cend(); ++j)
      graph->AddArc(*j, StdArc(isyms->Find(arc_symbol), osyms->Find(arc_symbol), 0, *i));
  }

  graph->Write("binary.fst");
  return 1;
}

Please excuse the coding its still a prototype.

This allowed me to produce the following:

Which gave the desired result. So I was wondering if my approach is correct and if there is a better approach to this?

RafNie commented 2 years ago

I stopped experimenting with JSGF quite a long time ago. But Your approach looks good my, so If its works better then use it.

You can add unk arcs at the level of defining the JSGF grammar.

Lightning101 commented 2 years ago

Hi @RafNie thanks for the advise will work it out like this for now.

Side note when adding UNK within the jsgf there are places that are a bit tricky to add it to like this " = set ;". I think i might have to define another variable just to hold set like so " = ( set | unk ) " to produce something similar to the above graph. It becomes a pain staking operation as the jsGF get larger unless initially intended to be used as a Gr.fst.

RafNie commented 2 years ago

Right, your approach for adding UNK arcs is more convenient. You can focus only on designing pure grammar during creating the JSGF file.

I think the JSGF grammar can be good only for small set of commands. For larger systems the classical N-gram grammar with wider vocabulary plus additional NLP module would be more convenient for designer.

ACmaster7 commented 3 weeks ago

Hi @RafNie, I hope you're still around. I could use your help. Before finding your approach I was using the compile-graph.sh file to recompile the English model, limiting it to only the specific words and phrases I need for my commands (around 80 phrases). There’s no merging of vocabularies or grammars with the main model here; I completely replace them with my own limited set of words and phrases.

#!/bin/bash

set -x

. path.sh

rm -rf data
rm -rf exp/tdnn/lgraph
rm -rf exp/tdnn/lgraph_orig

mkdir -p data/dict
cp db/phone/* data/dict
cp new/lexicon.txt data/dict
cp new/corpus.txt db/

python3 ./get_vocab.py > data/words.vocab
ngramsymbols data/words.vocab data/words.syms
farcompilestrings --fst_type=compact --symbols=data/words.syms --keep_symbols --unknown_symbol="[unk]" db/corpus.txt data/corpus.far
ngramcount --order=3 data/corpus.far - |
    ngramprint --integers | grep -v "<unk>" | ngramread |
    ngrammake --method=witten_bell - data/corpus.mod
ngramprint --ARPA data/corpus.mod | gzip -c > data/en-us.lm.gz

utils/prepare_lang.sh data/dict "[unk]" data/lang_local data/lang
utils/format_lm.sh data/lang data/en-us.lm.gz data/dict/lexicon.txt data/lang_test_adapt

utils/mkgraph_lookahead.sh \
         --self-loop-scale 1.0 data/lang \
         exp/tdnn data/en-us.lm.gz exp/tdnn/lgraph

Following your approach, I made the fsm file out of the JSGF grammar, but don't know how I should incorporate this FSM file within the commands above to recompile the model, I know I should get rid of the ngram commands but don't know how I should do the rest.

nshmyrev commented 3 weeks ago

Hi @RafNie, I hope you're still around. I could use your help. Before finding your approach I was using the compile-graph.sh file to recompile the English model, limiting it to only the specific words and phrases I need for my commands (around 80 phrases). There’s no merging of vocabularies or grammars with the main model here; I completely replace them with my own limited set of words and phrases.

#!/bin/bash

set -x

. path.sh

rm -rf data
rm -rf exp/tdnn/lgraph
rm -rf exp/tdnn/lgraph_orig

mkdir -p data/dict
cp db/phone/* data/dict
cp new/lexicon.txt data/dict
cp new/corpus.txt db/

python3 ./get_vocab.py > data/words.vocab
ngramsymbols data/words.vocab data/words.syms
farcompilestrings --fst_type=compact --symbols=data/words.syms --keep_symbols --unknown_symbol="[unk]" db/corpus.txt data/corpus.far
ngramcount --order=3 data/corpus.far - |
    ngramprint --integers | grep -v "<unk>" | ngramread |
    ngrammake --method=witten_bell - data/corpus.mod
ngramprint --ARPA data/corpus.mod | gzip -c > data/en-us.lm.gz

utils/prepare_lang.sh data/dict "[unk]" data/lang_local data/lang
utils/format_lm.sh data/lang data/en-us.lm.gz data/dict/lexicon.txt data/lang_test_adapt

utils/mkgraph_lookahead.sh \
         --self-loop-scale 1.0 data/lang \
         exp/tdnn data/en-us.lm.gz exp/tdnn/lgraph

Following your approach, I made the fsm file out of the JSGF grammar, but don't know how I should incorporate this FSM file within the commands above to recompile the model, I know I should get rid of the ngram commands but don't know how I should do the rest.

mkgraph_lookahead can also take G.fst instead of arpa, see the script arguments

Overall, we recommend you to just use list of sample phrases and use grammar only for result parsing, not for the recognizer.

ACmaster7 commented 3 weeks ago

mkgraph_lookahead can also take G.fst instead of arpa, see the script arguments

So, you're suggesting I can convert my FSM language model file into Gr.fst and then provide Gr.fst as an argument to mkgraph_lookahead to recompile the model? I'll give that a try—thanks for the tip!

Overall, we recommend you to just use list of sample phrases and use grammar only for result parsing, not for the recognizer.

I’m really unclear about this part. Could you explain how to use grammar for result parsing? Also, when you mention using a list of sample phrases, do you mean providing these phrases directly to the Kaldi recognizer, or are you suggesting using SetGrammar? Wouldn’t SetGrammar still limit the recognizer in the same way?