flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

Help with Decoding: Invalid dictionary filepath specified #454

Closed maltejanssen closed 4 years ago

maltejanssen commented 4 years ago

Hi, I am in need of help. I have downloaded a pre-trained model which I am trying to use for speech to text decoding using your Decode script.

The pre-trained model contains the follwing files:

[root@pbfile07 w2l-generic-de-r20190427]# ls -l
total 1648280
-rw-rw-r-- 1 1054 1054         64 Apr 27  2019 AUTHORS
-rw-rw-r-- 1 1054 1054   16273611 Apr 27  2019 lexicon.txt
-rw-rw-r-- 1 1054 1054       7651 Apr 27  2019 LICENSE
-rw-rw-r-- 1 1054 1054 1671492046 Apr 27  2019 model.bin
-rw-rw-r-- 1 1054 1054      47532 Apr 27  2019 README.md
-rw-rw-r-- 1 1054 1054        273 Apr 27  2019 tokens.txt

A language model was also privided, which I converted into a binary using KenLM.

[root@pbfile07 languageModel]# ls -l
total 6013708
-rw-r--r-- 1 root root 4015311071 May  1  2019 generic_de_lang_model_large-r20190501.arpa
-rw-r--r-- 1 root root 2142721765 Dec  9 12:34 lm.binary

Now I am not sure on how to decode audio using the Decoder. I have created some configuration settings in a file settings.cfg closing resembling the ones provided by the people who trained the model (only changing some parameters that threw illegal parameter when calling Decode and of course setting paths):

--datadir=/root/projects/asr/test
--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=/root/projects/asr/model/w2l-generic-de-r20190427/tokens.txt
--lexicon=/root/projects/asr/model/w2l-generic-de-r20190427/lexicon.txt
--am=/root/projects/asr/model/w2l-generic-de-r20190427/model.bin
--lm=/root/projects/asr/languageModel/lm.binary
--sclite=logs
--lmweight=0.5
--nthread_decoder=2
--wordscore=2.2
--beamsize=2500
--beamscore=40
--smearing=max
--show
--showletters

Now most of the paths I had to set were fairly obvious. But one I am really not too sure of. What am I supposed to set the tokensdir parameter as? What does Decode expect to find there? Something different then tokens.txt? Calling decoder with help gives the hint (dictionary directory). IS it ment to contain a file with all words seen in training? I thought that was what lexicon is for.

When running Decode with the above parameters I get the follwoing error:

[root@pbfile07 asr]# /opt/wav2letter/build/Decoder --flagsfile settings.cfg
I1210 12:10:01.559008  3997 Decode.cpp:112] Gflags after parsing
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/projects/asr/model/w2l-generic-de-r20190427/model.bin; --arch=network.arch; --archdir=config/conv_glu; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=/root/projects/asr/test ; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=settings.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --hardselection=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/projects/asr/model/w2l-generic-de-r20190427/lexicon.txt; --linlr=-1; --linlrcrit=-1; --linseg=1; --lm=/root/projects/asr/languageModel/generic_de_lang_model_large-r20190501.arpa; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0.5; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.59999999999999998; --lrcosine=false; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=0.20000000000000001; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=2; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=models; --runname=generic-de; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=logs; --seed=0; --show=true; --showletters=true; --silweight=0; --smearing=max; --smoothingtemperature=1; --softselection=inf; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=; --tokens=/root/projects/asr/model/w2l-generic-de-r20190427/tokens.txt; --tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427; --train=train; --trainWithWindow=false; --transdiag=4; --unkweight=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=valid; --weightdecay=0; --wordscore=2.2000000000000002; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=0; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=;
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid dictionary filepath specified.
*** Aborted at 1575976201 (unix time) try "date -d @1575976201" if you are using GNU date ***
PC: @     0x7f0f3419d93f __GI_raise
*** SIGABRT (@0xf9d) received by PID 3997 (TID 0x7f0f403eec80) from PID 3997; stack trace: ***
    @     0x7f0f350a4429 (unknown)
    @     0x7f0f352cfd80 (unknown)
    @     0x7f0f3419d93f __GI_raise
    @     0x7f0f34187c95 __GI_abort
    @     0x7f0f3e56b695 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f0f3e4d95e6 __cxxabiv1::__terminate()
    @     0x7f0f3e4d9631 std::terminate()
    @     0x7f0f3e4d70e3 __cxa_throw
    @           0x431fde main
    @     0x7f0f34189813 __libc_start_main
    @           0x42f3fe _start
Aborted (core dumped)

I think it is probably due to setting the tokensdir parameter wrong. Could anyone help me figure out what the tokensdir has to include?

awni commented 4 years ago

Take a look at this documentation: https://github.com/facebookresearch/wav2letter/blob/master/docs/decoder.md

tlikhomanenko commented 4 years ago

@maltejanssen,

you should change

--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=/root/projects/asr/model/w2l-generic-de-r20190427/tokens.txt

to

--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=tokens.txt

because we do concatenation of dir and tokenfile name.

maltejanssen commented 4 years ago

@tlikhomanenko Thank you very much for the quick response. This indeed fixed my issue.

On another note, could you give me a hint on how to actually decode a file if you have some time? Right now I am getting the error: Train files not found from test thrown in https://github.com/facebookresearch/wav2letter/blob/96b0c074870fb8ce9b96f7e7bd9e6a2080988e66/src/data/W2lListFilesDataset.cpp .

I get that for training, validation and testing there has to be a list of files to be used that include the actual transcription to calculate metrics and errors (as specified here https://github.com/facebookresearch/wav2letter/blob/master/docs/data_prep.md). But surely I don't have to provide the transcription when introducing a new audio file to the already trained network.

Am I using a wrong flag to provide the file to be decoded? As you can see in my first post I provided the file using the flag datadir. I looked in https://github.com/facebookresearch/wav2letter/blob/master/src/common/Defines.cpp , but coulnd't find one that might be more appropriate.

maltejanssen commented 4 years ago

I have now tried specifying the test flag instead of the datadir flag. I have included a .lst file that looks the following (the two last values being arbitrary) (I have also tried the full path resulting in a segmentation fault): test1 demo2.wav 100.00 test text

settings.cfg looks the following:

--test=test/test.lst
--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=tokens.txt
--lexicon=/root/projects/asr/model/w2l-generic-de-r20190427/lexicon.txt
--am=/root/projects/asr/model/w2l-generic-de-r20190427/model.bin
--lm=/root/projects/asr/languageModel/lm.binary
--sclite=logs
--lmweight=0.5
--nthread_decoder=2
--wordscore=2.2
--beamsize=2500
--beamscore=40
--smearing=max
--show
--showletters

And the test directory like that:

[root@pbfile07 asr]# cd data/test/
[root@pbfile07 test]# ls -l
total 200
-rw-rw-r-- 1 1054 1054 196652 May 20  2018 demo2.wav
-rw-r--r-- 1 root root     30 Dec 11 12:18 test.lst

This gets me the following error:

I1211 12:19:22.280269 24567 Decode.cpp:133] Number of classes (network): 102
I1211 12:19:25.688243 24567 Decode.cpp:140] Number of words: 456055
I1211 12:19:27.227627 24567 W2lListFilesDataset.cpp:141] 1 files found.
I1211 12:19:27.227684 24567 Utils.cpp:102] Filtered 0/1 samples
I1211 12:19:27.227703 24567 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 1
I1211 12:19:27.227715 24567 Decode.cpp:154] [Serialization] Running forward pass ...
terminate called after throwing an instance of 'std::runtime_error'
  what():  could not open file demo2.wav
*** Aborted at 1576063167 (unix time) try "date -d @1576063167" if you are using GNU date ***
PC: @     0x7f9daf57593f __GI_raise
*** SIGABRT (@0x5ff7) received by PID 24567 (TID 0x7f9dbb7c6c80) from PID 24567; stack trace: ***
    @     0x7f9db047c429 (unknown)
    @     0x7f9db06a7d80 (unknown)
    @     0x7f9daf57593f __GI_raise
    @     0x7f9daf55fc95 __GI_abort
    @     0x7f9db9943695 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f9db98b15e6 __cxxabiv1::__terminate()
    @     0x7f9db98b1631 std::terminate()
    @     0x7f9db98af0e3 __cxa_throw
    @           0x672d95 w2l::loadSound<>()
    @           0x690f1d w2l::W2lListFilesDataset::loadSound()
    @           0x690c54 w2l::W2lListFilesDataset::getLoaderData()
    @           0x67bde6 w2l::W2lDataset::getFeatureData()
    @           0x67bf81 w2l::W2lDataset::getFeatureDataAndPrefetch()
    @           0x67b951 w2l::W2lDataset::get()
    @           0x442ffb fl::detail::DatasetIterator<>::operator*()
    @           0x4324e8 main
    @     0x7f9daf561813 __libc_start_main
    @           0x42f3fe _start

I would really appriciate any help.

tlikhomanenko commented 4 years ago

Hi @maltejanssen,

Sorry for the late response. So yes, data should be provided as a list file where you specify "id" "full absolute path to the file" "duration" "target transcription". If you haven't target transcription you can set it empty (I believe) in the list file or as you did like fake text.

The datadir is used to define only the full path to the list file and inside list file the paths to your audio files should be absolute paths. So instead of

test1 demo2.wav 100.00 test text

it should be

test1 /full/path/to/demo2.wav 100.00 test text
maltejanssen commented 4 years ago

Thanks again for the help. I think I tried that already with another error, but will try so again when having access to the machine on Monday.

maltejanssen commented 4 years ago

I have just tried with the full path again and I am getting a Memory error (segmentation fault).

I am not sure what could be the cause for this.


[root@pbfile07 asr]# /opt/wav2letter/build/Decoder --flagsfile settings.cfg
I1216 10:11:19.341881 24764 Decode.cpp:112] Gflags after parsing
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/projects/asr/model/w2l-generic-de-r20190427/model.bin; --arch=network.arch; --archdir=config/conv_glu; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=data; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=settings.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --hardselection=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/projects/asr/model/w2l-generic-de-r20190427/lexicon.txt; --linlr=-1; --linlrcrit=-1; --linseg=1; --lm=/root/projects/asr/languageModel/lm.binary; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0.5; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.59999999999999998; --lrcosine=false; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=0.20000000000000001; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=2; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=models; --runname=generic-de; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=logs; --seed=0; --show=true; --showletters=true; --silweight=0; --smearing=max; --smoothingtemperature=1; --softselection=inf; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=test/test.lst; --tokens=tokens.txt; --tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427 ; --train=train; --trainWithWindow=false; --transdiag=4; --unkweight=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=valid; --weightdecay=0; --wordscore=2.2000000000000002; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=0; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=;
I1216 10:11:19.342491 24764 Decode.cpp:133] Number of classes (network): 102
I1216 10:11:22.732615 24764 Decode.cpp:140] Number of words: 456055
I1216 10:11:24.224720 24764 W2lListFilesDataset.cpp:141] 1 files found.
I1216 10:11:24.224767 24764 Utils.cpp:102] Filtered 0/1 samples
I1216 10:11:24.224784 24764 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 1
I1216 10:11:24.224802 24764 Decode.cpp:154] [Serialization] Running forward pass ...
I1216 10:11:28.404969 24764 Decode.cpp:201] [Dataset] Number of samples per thread: 1
I1216 10:11:28.934695 24764 Decode.cpp:309] [Decoder] LM constructed.
*** Aborted at 1576487488 (unix time) try "date -d @1576487488" if you are using GNU date ***
PC: @           0x5e78de std::_Hashtable<>::_M_find_before_node()
*** SIGSEGV (@0xbd88c427) received by PID 24764 (TID 0x7fb9ef39bc80) from PID 18446744072594441255; stack trace: ***
    @     0x7fb9e4051429 (unknown)
    @     0x7fb9e427cd80 (unknown)
    @           0x5e78de std::_Hashtable<>::_M_find_before_node()
    @           0x5e73d2 std::_Hashtable<>::_M_find_node()
    @           0x5e6e50 std::_Hashtable<>::find()
    @           0x5e6b37 std::unordered_map<>::find()
    @           0x5e60f2 w2l::Trie::insert()
    @           0x433a69 main
    @     0x7fb9e3136813 __libc_start_main
    @           0x42f3fe _start
Segmentation fault (core dumped)

@tlikhomanenko Do you have any idea what could be causing this or should I open a new issue for this?

tlikhomanenko commented 4 years ago

Hi @maltejanssen,

Yep, please new separate issue, I didn't meet this problem previously, let's continue discussion in the new issue.