Closed gospodima closed 5 years ago
Thank you for the detailed notes. I am working on duplicating this issue right now.
@aaronchantrill I forgot to say that dict-xsampa.fst was taken from this repo. (In zamia-speech project I didn't find fst model)
@gospodima Thanks, I was just working on building the .fst file, but I'll use that one. Since you say that it runs fine from master, I assume you are using the older versions of pocketsphinx and phonetisaurus from the jasper documentation (which should still work fine with naomi-dev). I'm building that environment now so I can test master.
I had a few interruptions today. I got my environment set up finally (had to use a new trick to get openfst installed this time as documented here https://github.com/kaldi-asr/kaldi/issues/1030) but now I have to go to sleep. I'll give it a test with your profile in the morning. It might be helpful to me if you would be willing to send me a recording of yourself or someone asking Naomi for the time or something in German (since I don't speak German, my accent would probably confuse it). I'd also be willing to write a test script that feeds in a few audio clips and verifies the results, which I think could be a big help for future testing/maintenance.
No problem. I can download audio from google translate for my current testing.
@gospodima
I'm getting a segmentation fault with the suggested model and dictionary files. Is that the original error you got before building the files manually? I'm looking into where and why exactly that is occurring now. I have current copies of both the master and dev copies on my local machine, and have verified that master does not have an issue, so it must be something in the plugin code. I should have an update tomorrow.
I'm getting sidetracked because in the old Naomi, the output is playing at the wrong speed so it sounds like chipmunks have invaded my computer when using both pico and flite.
@aaronchantrill
Is that the original error you got before building the files manually?
What do you mean with 'building the files manually'? I run naomi, the program stops during the building of vocabulary (I am not sure that this is a segfault because I don't see any error message) and then I run naomi again and become the above error. I suppose it caused by wrong vocabulary creation. So I didn't create the files manually.
the output is playing at the wrong speed so it sounds like chipmunks have invaded my computer when using both pico and flite.
I think it can be caused by setting a wrong output device.
Okay, from the description above I thought you had created the "default" and "keyword" vocabularies manually and put them into the "vocabularies" folder in ~/.naomi. That's something that I've done before for different reasons, if a vocabulary file doesn't generate correctly but does create a checksum. Now I understand that you ran the program once, which appears to have generated the vocabulary files but then exited for some unknown reason without generating any sort of error message, then when you ran it a second time it generated an error message about pocketsphinx exiting with a non-zero exit code during initialization.
I appear to have pocketsphinx version 4 installed, and it looks like the line you are erroring on has to do with initializing pocketsphinx version 5. Can you tell me exactly which version of pocketsphinx you have installed, maybe with a link to the download? Unfortunately, I have never found a way to query pocketsphinx directly about the currently installed version.
Running naomi should generate a log file in /tmp named something like psdecoder_GoBLdyGooP.log. Could you send a copy of that file generated by the dev version? That may shed some more light on why the library is returning a non-zero code. You may also want to reset your vocabulary files by deleting the ~/.naomi/vocabularies folder and rerunning naomi with "stdbuf -o0 ./Naomi.py --debug |& tee naomi.log" to generate a log file with debugging information to see if we can figure out what the initial problem was.
I think my chipmunk problems are related to the fact that someone decided to change the locations of the audio settings at some point, so I'm just running with defaults and none of my changes are taking effect. I just need to figure out where it is looking for them in profile and I should be able to get it sorted.
Thank you for your patience. I really appreciate your help with this.
@aaronchantrill Unfortunately I didn't find how to check pocketsphinx version, but I am sure that I installed it according to the jasper documentation.
Running naomi should generate a log file in /tmp named something like psdecoder_GoBLdyGooP.log. Could you send a copy of that file generated by the dev version?
I deleted all files in /tmp created by naomi and then started dev version. Now I have there only one file:
cat tmpDGo9hI.vocab
## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
## Includes 30 words ##
</s>
<s>
BEENDE
BENACHRICHTIGUNG
DES
DICH
EMAIL
FACEBOOK
GEBURTSTAG
HACKER
HEUTE
JA
LEBENS
MORGEN
MUSIK
NACHRICHTEN
NEIN
NEWS
NO
POSTEINGANG
SCHLAGZEILEN
SCHLIESSEN
SINN
SPOTIFY
STOPP
TEMPERATUR
UHR
VORHERSAGE
WETTER
YES
Starting naomi once again didn't give any changes (I mean any other log files were not created).
I think I have the issue figured out. It has to do with the standard english dictionary being in upper case in the older dictionary and lower case in the newer dictionary. I tried to fix this by changing the case when Naomi is generating the custom dictionaries, but the real fix would be to make sure that the dictionary case matches the phrase case. This is more important in German than in English. If I take the dictionaries generated by naomi-master and copy them into naomi-dev (leaving the revision files untouched), then everything works correctly. The fix is pretty simple, but means updating my instructions for installing and building the english dictionary.
Try this: cp -v ~/.jasper/vocabularies/de-DE/sphinx/default/dictionary ~/.naomi/vocabularies/de-DE/sphinx/default; cp -v ~/.jasper/vocabularies/de-DE/sphinx/default/languagemodel ~/.naomi/vocabularies/de-DE/sphinx/default; cp -v ~/.jasper/vocabularies/de-DE/sphinx/keyword/dictionary ~/.naomi/vocabularies/de-DE/sphinx/keyword; cp -v ~/.jasper/vocabularies/de-DE/sphinx/keyword/languagemodel ~/.naomi/vocabularies/de-DE/sphinx/keyword
@aaronchantrill
cp -v ~/.jasper/vocabularies/de-DE/sphinx/default/dictionary ~/.naomi/vocabularies/de-DE/sphinx/default; cp -v ~/.jasper/vocabularies/de-DE/sphinx/default/languagemodel ~/.naomi/vocabularies/de-DE/sphinx/default; cp -v ~/.jasper/vocabularies/de-DE/sphinx/keyword/dictionary ~/.naomi/vocabularies/de-DE/sphinx/keyword; cp -v ~/.jasper/vocabularies/de-DE/sphinx/keyword/languagemodel ~/.naomi/vocabularies/de-DE/sphinx/keyword
Your instructions are slighlty incorrect because you replace keyword folder with languagemodel file. But generally copying of the dictionaries works. Now dev version is running as it should. Thanks for your help!
However, I didn't fully understand where was the problem.
It has to do with the standard english dictionary being in upper case in the older dictionary and lower case in the newer dictionary.
Which dictionaries do you mean? And in which files do we have now lower case instead of upper case?
That shouldn't replace the keyword folder, just copy the file into it.
This doesn't really fix the issue, just verifies that the issue is in generating the language model and dictionary files. As soon as anything changed in the list of words being generated (adding a plugin, changing a word) you would be right back to this problem.
I'm still looking into what the difference is exactly. It turns out that the original version does apply an upper() to the dictionary words, and also has some code to automatically convert everything to lower() if nothing matches, so I'm still a little mystified. I'll provide a full explanation when I get there.
I have uploaded pull request #123 which fixes the issue of the specific pocketsphinx dictionary case (upper or lower) depending on the version of phonetisaurus, and returning it to a state where we try to feed the specific word list to phonetisaurus in all upper case first, and if there is a "symbol not found" issue then automatically attempt to re-feed it in as all lower case. That way plugin authors don't have to worry about case-sensitivity when writing their plugins.
It is necessary for the dictionary to be either all upper case or all lower case, otherwise phonetisaurus might get confused, and forcing vocabulary words to upper or lower case is how the original author decided to deal with it.
For future reference, it is only necessary to have one or the other of the German models listed in the beginning of this thread. For my final testing, I only used the second one. Here is my test procedure, using an audio clip downloaded from Google Translate of the english "Hello, can you hear me?" to the German "hallo, kannst du mich hören?" (unfortunately, I don't seen to be able to attach it here, but you can download it from https://github.com/aaronchantrill/Naomi/raw/master/tests/test_german.wav)
[~/]$ git clone https://github.com/G10DRAS/German-G2P-and-Acoustic-Model
[~/]$ cd German-G2P-and-Acoustic-Model
[~/German-G2P-and-Acoustic-Model]$ tar -zxvf cmusphinx-de-ptm-voxforge-5.2.tar.gz
[~/German-G2P-and-Acoustic-Model]$ cd
[~/]$ mkdir test
[~/]$ cd test
[~/test]$ echo "<s> HALLO KANNST DU MICH HÖREN </s>" > test_reference.txt
[~/test]$ text2wfreq < test_reference.txt > test.wfreq
[~/test]$ cat test.wfreq | wfreq2vocab > test.vocab
[~/test]$ text2idngram -vocab test.vocab -idngram test.idngram < test_reference.txt
[~/test]$ idngram2lm -vocab_type 0 -idngram test.idngram -vocab test.vocab -arpa test.lm
[~/test]$ phonetisaurus-g2pfst --model=~/German-G2P-and-Acoustic-Model/voxforge-de.fst --nbest=1 --beam=1000 --thresh=99.0 --accumulate=true --pmass=0.85 --nlog_probs=false --wordlist=./test.vocab > test.dict
[~/test]$ cat test.dict | sed -rne '/^([[:upper:]])+\s/p' | perl -pe 's/([0-9.])+//g;s/\s+/ /g;@_=split(/\s+/);$w=shift(@_);$_=$w."\t".join(" ",@_)."\n";' > test.formatted.dict
[~/test]$ pocketsphinx_continuous -hmm ~/German-G2P-and-Acoustic-Model/cmusphinx-de-ptm-voxforge-5.2 -lw 10 -feat 1s_c_d_dd -beam 1e-80 -wbeam 1e-40 -lm ./test.lm -dict ./test.formatted.dict -wip 0.2 -agc none -varnorm no -cmn current -samprate 16000 -infile test_german.wav 2>/dev/null
This last line returned "HALLO KANNST DU MICH HÖREN". If it fails to return this, then remove the 2>/dev/null at the end and find out what the problem was.
Then I used the following in my profile.yml:
active-stt:
engine: sphinx
pocketsphinx:
fst_model: /home/pi/German-G2P-and-Acoustic-Model/voxforge-de.fst
hmm_dir: /home/pi/German-G2P-and-Acoustic-Model/cmusphinx-de-ptm-voxforge-5.2
phonetisaurus_executable: phonetisaurus-g2pfst
@aaronchantrill Thanks for your help! Unfortunately I can test it only on the next week. After that I will appear here.
You should be able to delete or rename the "vocabularies" folder under the "~/.naomi" folder (~/.naomi/vocabularies), update the Naomi source, and just have Naomi rebuild them. I'll try to get the patch merged in by this weekend. If you are ready to test and patch 123 hasn't been merged, please let me know. You should also be able to git clone directly from the branch I used to make the pull request
git clone -b phonetisaurus_fixes https://github.com/aaronchantrill/Naomi.git Naomi_fix
That would allow you to test the changes even before the patch is merged, but just temporarily since that branch won't receive future updates and I will most likely delete it once the patch has been merged. Thanks for the bug report and working with me. This will certainly improve the experience of using pocketsphinx with Naomi.
Hi @aaronchantrill
I tried to test your fix-branch today. When I write phonetisaurus_executable: phonetisaurus-g2pfst
in profile an error occurs:
DEBUG:sphinx_1_0_0.g2p:cmd: ['phonetisaurus-g2pfst', '--model=/home/pi/dev/German-G2P-and-Acoustic-Model/voxforge-de.fst', '--beam=1000', '--thresh=99.0', '--accumulate=true', '--pmass=0.85', '--nlog_probs=false', '--wordlist=/tmp/tmpe18SoU.g2p', '--nbest=3']
ERROR:sphinx_1_0_0.g2p:Error occured while executing command 'phonetisaurus-g2pfst --model=/home/pi/dev/German-G2P-and-Acoustic-Model/voxforge-de.fst --beam=1000 --thresh=99.0 --accumulate=true --pmass=0.85 --nlog_probs=false --wordlist=/tmp/tmpe18SoU.g2p --nbest=3'
Traceback (most recent call last):
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/g2p.py", line 64, in execute
stderr=subprocess.PIPE
File "/usr/lib/python2.7/subprocess.py", line 390, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
ERROR:naomi.vocabcompiler:Fatal compilation error occured
Traceback (most recent call last):
File "/home/pi/dev/Naomi-fix/naomi/vocabcompiler.py", line 141, in compile
compilation_func(config, self.path, phrases)
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/sphinxvocab.py", line 87, in compile_vocabulary
compile_dictionary(g2pconverter, vocabulary, dictionary_path)
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/sphinxvocab.py", line 149, in compile_dictionary
phonemes = g2pconverter.translate(words)
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/g2p.py", line 197, in translate
output = self._translate_words(words)
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/g2p.py", line 183, in _translate_words
nbest=self.nbest
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/g2p.py", line 64, in execute
stderr=subprocess.PIPE
File "/usr/lib/python2.7/subprocess.py", line 390, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "./dev/Naomi-fix/Naomi.py", line 5, in <module>
naomi.main()
File "/home/pi/dev/Naomi-fix/naomi/__main__.py", line 62, in main
repopulate=p_args.repopulate
File "/home/pi/dev/Naomi-fix/naomi/application.py", line 244, in __init__
self.config)
File "/home/pi/dev/Naomi-fix/plugins/stt/pocketsphinx-stt/sphinxplugin.py", line 44, in __init__
sphinxvocab.compile_vocabulary)
File "/home/pi/dev/Naomi-fix/naomi/plugin.py", line 82, in compile_vocabulary
self.profile, compilation_func, self._vocabulary_phrases)
File "/home/pi/dev/Naomi-fix/naomi/vocabcompiler.py", line 157, in compile
raise e
OSError: [Errno 2] No such file or directory
It happens with both models listed here.
I don't know which file it can't find, but /home/pi/dev/German-G2P-and-Acoustic-Model/voxforge-de.fst
is a correct path for me.
Without this line starting the project with the zamia-speech model goes like in issue described, with the second one it just hangs on the command DEBUG:sphinx_1_0_0.g2p:cmd: ['phonetisaurus-g2p', '--model=/home/pi/dev/German-G2P-and-Acoustic-Model/voxforge-de.fst', '--input=/tmp/tmpkN2ueB.g2p', '--words', '--isfile', '--nbest=3']
.
@gospodima - I'm sorry it's still frustrating you. Phonetisaurus-g2pfst is only available with the latest versions of phonetisaurus, so unless you have built phonetisaurus from the git repository, you probably still have phonetisaurus-g2p.
Can you try the following in your profile: active-stt: engine: sphinx pocketsphinx: fst_model: /home/pi/dev/German-G2P-and-Acoustic-Model/voxforge-de.fst hmm_dir: /home/pi/dev/German-G2P-and-Acoustic-Model/cmusphinx-de-ptm-voxforge-5.2 phonetisaurus_executable: phonetisaurus-g2p
I am right now installing the older version of phonetisaurus from the instructions here: http://jasperproject.github.io/documentation/installation/#installing-sphinx to try and match your environment.
Okay, I'm running through the Jasper install instructions and in order to even compile openfst-0.3.4 I had to change line 87 in src/script/text-io.cc from
bool ret = *strm
to
bool ret = static_cast<bool>(*strm)
which means that the jasper instructions are basically useless at this point. I'm not sure how you even managed to get this installed!
There is no fst file included with the zambia speech, so I assume you are still using that with the voxforge-de.fst file from G10DRAS. I have copied that file to the ~/dev/cmusphinx-ptm-generic-de-r20180609 folder to match the profile.yml from your initial post. Here is my profile.yml now:
active_stt:
engine: sphinx
reply: ''
response: ''
audio:
input_device: plughw-card-phone-dev-0
output_device: plughw-card-phone-dev-0
output_padding: false
audio_engine: alsa
email:
address: ''
imap: ''
input_channels: 1
input_chunksize: 1024
input_samplerate: 16000
input_samplewidth: 16
keyword: Naomi
language: de-DE
location: '23220'
output_chunksize: 1024
phone_number: ''
pocketsphinx:
fst_model: /home/pi/dev/cmusphinx-ptm-generic-de-r20180609/dict-xsampa.fst
hmm_dir: /home/pi/dev/cmusphinx-ptm-generic-de-r20180609/model_parameters/voxforge.cd_ptm_5000
phonetisaurus_executable: phonetisaurus-g2p
prefers_email: false
timezone: America/New_York
tts_engine: pico-tts
This worked fine on my system, in that it did not cause an error and Naomi started right up with its normal greeting and started listening. I would recommend using the dictionary and fst from the same source for best recognition, or at least compile your own .fst file using phonetisaurus-train as described in the Naomi Pocketsphinx instructions here: https://github.com/NaomiProject/Naomi/wiki/Installing-Pocketsphinx-and-Phonetisaurus
I tested a bunch on the command line using the Naomi pocketsphinx setup instructions, but every time I tried to use pocketsphinx_continuous I got an error message that continuous.c failed to calibrate voice activity detection, which I understand is an artifact of pocketsphinx 0.8 (something about not having enough silence at the beginning of the wav file for it to calibrate its voice detection).
I installed sphinxbase-5prealpha and pocketsphinx-5prealpha and attempted again and this time got HALLO KANNST DU MICH HÖREN DU
Unfortunately, again everything seems to be working fine, so I'm not sure how I can be much help at this point.
Here is the basic rundown of the commands I used to install and test:
mkdir dev
cd dev
wget http://downloads.sourceforge.net/project/cmusphinx/sphinxbase/0.8/sphinxbase-0.8.tar.gz
tar -zxvf sphinxbase-0.8.tar.gz
cd sphinxbase-0.8
./configure --enable-fixed
make
sudo make install
cd ~/dev
wget http://downloads.sourceforge.net/project/cmusphinx/pocketsphinx/0.8/pocketsphinx-0.8.tar.gz
tar -zxvf pocketsphinx-0.8.tar.gz
cd pocketsphinx-0.8/
./configure
make
sudo make install
cd ~/dev
sudo apt install subversion autoconf libtool automake gfortran g++
svn co https://svn.code.sf.net/p/cmusphinx/code/trunk/cmuclmtk/
cd cmuclmtk/
./autogen.sh
make
sudo make install
wget http://distfiles.macports.org/openfst/openfst-1.3.4.tar.gz
tar -zxvf openfst-1.3.4.tar.gz
cd openfst-1.3.4/
./configure --enable-compact-fsts --enable-const-fsts --enable-far --enable-lookahead-fsts --enable-pdt
vi src/script/text-io.cc
make
sudo make install
cd ~/dev
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/m2m-aligner/m2m-aligner-1.2.tar.gz
tar -zxvf m2m-aligner-1.2.tar.gz
cd m2m-aligner-1.2
make
sudo make install
cd ~/dev
wget https://github.com/mitlm/mitlm/releases/download/v0.4.1/mitlm_0.4.1.tar.gz
tar -zxvf mitlm_0.4.1.tar.gz
cd mitlm_0.4.1
./configure
make
sudo make install
cd ~/dev
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/phonetisaurus/is2013-conversion.tgz
tar -zxvf is2013-conversion.tgz
cd is2013-conversion/phonetisaurus/src
make
sudo make install
cd ~/dev
sudo cp ~/dev/m2m-aligner-1.2/m2m-aligner /usr/local/bin
sudo cp -iv ~/dev/is2013-conversion/bin/phonetisaurus-g2p /usr/local/bin
sudo ldconfig
git clone -b phonetisaurus_fixes https://github.com/aaronchantrill/Naomi.git Naomi_fix
cd Naomi_fix
sudo pip install -r python_requirements.txt
./compile_translations.sh
wget http://goofy.zamia.org/zamia-speech/asr-models/cmusphinx-ptm-generic-de-r20180609.tar.xz
tar -xvf cmusphinx-ptm-generic-de-r20180609.tar.xz
git clone https://github.com/G10DRAS/German-G2P-and-Acoustic-Model
cd German-G2P-and-Acoustic-Model/
cp -iv dict-xsampa.fst ~/dev/cmusphinx-ptm-generic-de-r20180609
cd
mkdir tests
cd tests
wget https://github.com/aaronchantrill/Naomi/raw/master/tests/test_german.wav
echo "<s> HALLO KANNST DU MICH HÖREN </s>" > test_reference.txt
text2wfreq < test_reference.txt > test.wfreq
cat test.wfreq | wfreq2vocab > test.vocab
text2idngram -vocab test.vocab -idngram test.idngram < test_reference.txt
idngram2lm -vocab_type 0 -idngram test.idngram -vocab test.vocab -arpa test.lm
phonetisaurus-g2p --model=/home/pi/dev/German-G2P-and-Acoustic-Model/voxforge-de.fst --input=test.vocab --words --isfile --nbest=3 > test.dict
cat test.dict | perl -pe 's/([0-9].)+//g;s/\s+/ /g;@_=split(/\s+/);$w=shift(@_);$_=$w."\t".join(" ",@_)."\n";' > test.formatted.dict
pocketsphinx_continuous -hmm ~/dev/cmusphinx-ptm-generic-de-r20180609/model_parameters/voxforge.cd_ptm_5000/ -lm test.lm -dict ./test.formatted.dict -samprate 16000 -infile test_german.wav
Okay, I think this is resolved, so I am going to mark it resolved and close it for now. Feel free to open it again if it doesn't work for you next time and I'll be happy to work on it until we can resolve it. I am going to add the German language support instructions to the Wiki.
Okay, I'm just waiting on the pull request to be merged.
Pull request merged. Thanks!
I made update from dev-branch and now I have no problem with dictionaries. Thanks!
Describe the bug I tried to migrate from master branch to naomi-dev branch, but it seems to me that naomi-dev version doesn't work with German language.
To Reproduce Steps to reproduce the behavior: I used profile.yml that was written for master branch:
I used german language model from zamia-speech project.
Expected behavior master branch works perfectly with this configuration but naomi-dev stops after creating default-vocabulary. I also tried to run the project after creating default-vocabulary and in this case the project crashes with
With English the project works pretty good. System