Closed maltejanssen closed 4 years ago
Take a look at this documentation: https://github.com/facebookresearch/wav2letter/blob/master/docs/decoder.md
@maltejanssen,
you should change
--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=/root/projects/asr/model/w2l-generic-de-r20190427/tokens.txt
to
--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=tokens.txt
because we do concatenation of dir and tokenfile name.
@tlikhomanenko Thank you very much for the quick response. This indeed fixed my issue.
On another note, could you give me a hint on how to actually decode a file if you have some time? Right now I am getting the error: Train files not found from test thrown in https://github.com/facebookresearch/wav2letter/blob/96b0c074870fb8ce9b96f7e7bd9e6a2080988e66/src/data/W2lListFilesDataset.cpp .
I get that for training, validation and testing there has to be a list of files to be used that include the actual transcription to calculate metrics and errors (as specified here https://github.com/facebookresearch/wav2letter/blob/master/docs/data_prep.md). But surely I don't have to provide the transcription when introducing a new audio file to the already trained network.
Am I using a wrong flag to provide the file to be decoded? As you can see in my first post I provided the file using the flag datadir. I looked in https://github.com/facebookresearch/wav2letter/blob/master/src/common/Defines.cpp , but coulnd't find one that might be more appropriate.
I have now tried specifying the test flag instead of the datadir flag. I have included a .lst file that looks the following (the two last values being arbitrary) (I have also tried the full path resulting in a segmentation fault):
test1 demo2.wav 100.00 test text
settings.cfg looks the following:
--test=test/test.lst
--tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427
--tokens=tokens.txt
--lexicon=/root/projects/asr/model/w2l-generic-de-r20190427/lexicon.txt
--am=/root/projects/asr/model/w2l-generic-de-r20190427/model.bin
--lm=/root/projects/asr/languageModel/lm.binary
--sclite=logs
--lmweight=0.5
--nthread_decoder=2
--wordscore=2.2
--beamsize=2500
--beamscore=40
--smearing=max
--show
--showletters
And the test directory like that:
[root@pbfile07 asr]# cd data/test/
[root@pbfile07 test]# ls -l
total 200
-rw-rw-r-- 1 1054 1054 196652 May 20 2018 demo2.wav
-rw-r--r-- 1 root root 30 Dec 11 12:18 test.lst
This gets me the following error:
I1211 12:19:22.280269 24567 Decode.cpp:133] Number of classes (network): 102
I1211 12:19:25.688243 24567 Decode.cpp:140] Number of words: 456055
I1211 12:19:27.227627 24567 W2lListFilesDataset.cpp:141] 1 files found.
I1211 12:19:27.227684 24567 Utils.cpp:102] Filtered 0/1 samples
I1211 12:19:27.227703 24567 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 1
I1211 12:19:27.227715 24567 Decode.cpp:154] [Serialization] Running forward pass ...
terminate called after throwing an instance of 'std::runtime_error'
what(): could not open file demo2.wav
*** Aborted at 1576063167 (unix time) try "date -d @1576063167" if you are using GNU date ***
PC: @ 0x7f9daf57593f __GI_raise
*** SIGABRT (@0x5ff7) received by PID 24567 (TID 0x7f9dbb7c6c80) from PID 24567; stack trace: ***
@ 0x7f9db047c429 (unknown)
@ 0x7f9db06a7d80 (unknown)
@ 0x7f9daf57593f __GI_raise
@ 0x7f9daf55fc95 __GI_abort
@ 0x7f9db9943695 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f9db98b15e6 __cxxabiv1::__terminate()
@ 0x7f9db98b1631 std::terminate()
@ 0x7f9db98af0e3 __cxa_throw
@ 0x672d95 w2l::loadSound<>()
@ 0x690f1d w2l::W2lListFilesDataset::loadSound()
@ 0x690c54 w2l::W2lListFilesDataset::getLoaderData()
@ 0x67bde6 w2l::W2lDataset::getFeatureData()
@ 0x67bf81 w2l::W2lDataset::getFeatureDataAndPrefetch()
@ 0x67b951 w2l::W2lDataset::get()
@ 0x442ffb fl::detail::DatasetIterator<>::operator*()
@ 0x4324e8 main
@ 0x7f9daf561813 __libc_start_main
@ 0x42f3fe _start
I would really appriciate any help.
Hi @maltejanssen,
Sorry for the late response. So yes, data should be provided as a list file where you specify "id" "full absolute path to the file" "duration" "target transcription". If you haven't target transcription you can set it empty (I believe) in the list file or as you did like fake text.
The datadir is used to define only the full path to the list file and inside list file the paths to your audio files should be absolute paths. So instead of
test1 demo2.wav 100.00 test text
it should be
test1 /full/path/to/demo2.wav 100.00 test text
Thanks again for the help. I think I tried that already with another error, but will try so again when having access to the machine on Monday.
I have just tried with the full path again and I am getting a Memory error (segmentation fault).
I am not sure what could be the cause for this.
[root@pbfile07 asr]# /opt/wav2letter/build/Decoder --flagsfile settings.cfg
I1216 10:11:19.341881 24764 Decode.cpp:112] Gflags after parsing
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/projects/asr/model/w2l-generic-de-r20190427/model.bin; --arch=network.arch; --archdir=config/conv_glu; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=data; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=settings.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --hardselection=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/projects/asr/model/w2l-generic-de-r20190427/lexicon.txt; --linlr=-1; --linlrcrit=-1; --linseg=1; --lm=/root/projects/asr/languageModel/lm.binary; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0.5; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.59999999999999998; --lrcosine=false; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=0.20000000000000001; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=2; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=models; --runname=generic-de; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=logs; --seed=0; --show=true; --showletters=true; --silweight=0; --smearing=max; --smoothingtemperature=1; --softselection=inf; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=test/test.lst; --tokens=tokens.txt; --tokensdir=/root/projects/asr/model/w2l-generic-de-r20190427 ; --train=train; --trainWithWindow=false; --transdiag=4; --unkweight=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=valid; --weightdecay=0; --wordscore=2.2000000000000002; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=0; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=;
I1216 10:11:19.342491 24764 Decode.cpp:133] Number of classes (network): 102
I1216 10:11:22.732615 24764 Decode.cpp:140] Number of words: 456055
I1216 10:11:24.224720 24764 W2lListFilesDataset.cpp:141] 1 files found.
I1216 10:11:24.224767 24764 Utils.cpp:102] Filtered 0/1 samples
I1216 10:11:24.224784 24764 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 1
I1216 10:11:24.224802 24764 Decode.cpp:154] [Serialization] Running forward pass ...
I1216 10:11:28.404969 24764 Decode.cpp:201] [Dataset] Number of samples per thread: 1
I1216 10:11:28.934695 24764 Decode.cpp:309] [Decoder] LM constructed.
*** Aborted at 1576487488 (unix time) try "date -d @1576487488" if you are using GNU date ***
PC: @ 0x5e78de std::_Hashtable<>::_M_find_before_node()
*** SIGSEGV (@0xbd88c427) received by PID 24764 (TID 0x7fb9ef39bc80) from PID 18446744072594441255; stack trace: ***
@ 0x7fb9e4051429 (unknown)
@ 0x7fb9e427cd80 (unknown)
@ 0x5e78de std::_Hashtable<>::_M_find_before_node()
@ 0x5e73d2 std::_Hashtable<>::_M_find_node()
@ 0x5e6e50 std::_Hashtable<>::find()
@ 0x5e6b37 std::unordered_map<>::find()
@ 0x5e60f2 w2l::Trie::insert()
@ 0x433a69 main
@ 0x7fb9e3136813 __libc_start_main
@ 0x42f3fe _start
Segmentation fault (core dumped)
@tlikhomanenko Do you have any idea what could be causing this or should I open a new issue for this?
Hi @maltejanssen,
Yep, please new separate issue, I didn't meet this problem previously, let's continue discussion in the new issue.
Hi, I am in need of help. I have downloaded a pre-trained model which I am trying to use for speech to text decoding using your Decode script.
The pre-trained model contains the follwing files:
A language model was also privided, which I converted into a binary using KenLM.
Now I am not sure on how to decode audio using the Decoder. I have created some configuration settings in a file settings.cfg closing resembling the ones provided by the people who trained the model (only changing some parameters that threw illegal parameter when calling Decode and of course setting paths):
Now most of the paths I had to set were fairly obvious. But one I am really not too sure of. What am I supposed to set the tokensdir parameter as? What does Decode expect to find there? Something different then tokens.txt? Calling decoder with help gives the hint (dictionary directory). IS it ment to contain a file with all words seen in training? I thought that was what lexicon is for.
When running Decode with the above parameters I get the follwoing error:
I think it is probably due to setting the tokensdir parameter wrong. Could anyone help me figure out what the tokensdir has to include?