Closed sakares closed 4 years ago
Thank you Sakares, I am glad that you find it useful.
How did you create the serialized model that you are loading?
@avidov I created the serialized model by ran the following command from Docker-GPU
/root/wav2letter/build/Train train --flagsfile /root/wav2letter/tutorials/librispeech_clean/train.cfg
With a given train.cfg here
--datadir=/root/w2l_dataset/lists
--rundir=/root/w2l_dataset
--archdir=/root/wav2letter/tutorials/1-librispeech_clean/
--train=train-clean-100.lst
--valid=dev-clean.lst
--input=flac
--arch=network.arch
--tokens=/root/w2l_dataset/am/tokens.txt
--lexicon=/root/w2l_dataset/am/lexicon.txt
--criterion=ctc
--lr=0.1
--maxgradnorm=1.0
--replabel=1
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=8
--batchsize=8
--runname=v2
--iter=5
It turned out I could run successfully with Decode with my model ~/w2l/model/my_model.bin
on decode.cfg
/root/wav2letter/build/Decoder --flagsfile w2l_dataset/decode.cfg --logtostderr=1 --minloglevel=0
with result like
|T|: but now nothing could hold me back
|P|: but again to that
[sample: test-clean-8463-294828-0004, WER: 85.7143%, LER: 70.5882%, slice WER: 85.7143%, slice LER: 70.5882%, decoded samples (thread 0): 1]
|T|: stuff it into you his belly counselled him
|P|: but it is you
[sample: test-clean-1089-134686-0001, WER: 75%, LER: 71.4286%, slice WER: 75%, slice LER: 71.4286%, decoded samples (thread 3): 1]
|T|: so it is said anders
|P|: so sir
[sample: test-clean-7021-85628-0017, WER: 80%, LER: 70%, slice WER: 80%, slice LER: 70%, decoded samples (thread 2): 1]
|T|: tied to a woman
|P|: at that
[sample: test-clean-121-121726-0013, WER: 100%, LER: 80%, slice WER: 90.9091%, slice LER: 73.4694%, decoded samples (thread 0): 2]
|T|: then he comes to the beak of it
|P|: any one of the peace
[sample: test-clean-1188-133604-0006, WER: 87.5%, LER: 61.2903%, slice WER: 81.25%, slice LER: 67.1233%, decoded samples (thread 3): 2]
|T|: that is a very fine cap you have he said
|P|: it is only one of the head
[sample: test-clean-7021-85628-0016, WER: 90%, LER: 55%, slice WER: 90%, slice LER: 55%, decoded samples (thread 1): 1]
------
[Decode mini-test-clean.lst (6 samples) in 3.08623s (actual decoding time 0.256s/sample) -- WER: 85.7143, LER: 66.4835]
Even I have re- "cmake .. [params so on] && make -j8"
But I got an error when I ran with
./simple_streaming_asr_example --input_files_base_path ~/w2l/model/ --input_audio_file ~/w2l/test_audio.wav --acoustic_module_file ~/w2l/model/my_model.bin
Not sure if the simple_streaming_asr_example.cpp has a different model serialization from train.cpp or not?
Not sure if the simple_streaming_asr_example.cpp has a different model serialization from train.cpp or not
Yes, it expects a different serialization format. If you train the models using the recipe from out paper, we provide an automatic tool to convert the model to a serialization format which the streaming ASR example can load - See StreamingTDSModelConverter.cpp
in https://github.com/facebookresearch/wav2letter/tree/master/tools.
Note that the tool outputs acoustic model, feature extraction model, lexicon and token set. We also expect you to provide language model file and the optimal decoder param settings in --input_files_base_path
additionally. You can try to download our model from here to see all the things that are needed - https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples
If you train the models using the recipe from out paper, we provide an automatic tool to convert the model to a serialization format which the streaming ASR example can load - See StreamingTDSModelConverter.cpp
@vineelpratap Yes! It is what I was thinking about. Unfortunately, I have encountered many times failed built for Tools/StreamingTDSModelConverter
on the docker wav2letter/wav2letter:cpu-latest
Let me figure out and update soon
Update: After trying to make the StreamingTDSModelConverter, I have made a workaround step by copying inference
directory to src
with the following script:
export KENLM_ROOT_DIR=/root/kenlm && \
cd /root/ && \
cp -rf wav2letter/inference wav2letter/src/ && \
mkdir wav2letter/build_tools_test && cd wav2letter/build_tools_test && \
cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_LIBRARIES_USE_CUDA=OFF -DW2L_BUILD_INFERENCE=ON -DW2L_BUILD_TOOLS=ON && \
make streaming_tds_model_converter
since the wav2letter/tools/StreamingTDSModelConverter.cpp notifed about missing feature.h during making phase.
Note: about to reproduce the streaming_convnets on the docker wav2letter/wav2letter:cpu-latest
Updated: I have been able to reproduce the streaming_convnets and converted the model with StreamingTDSModelConverter for inference successfully
Note: I built the StreamingTDSConverter with the workaround script above. Feel free to re-open this issue if necessary.
what about the new interactive streaming example it stills only compatible with tds+ctc models?
I think it would support any acoustic and feature extractor model which could be either TDS, CTC, ASG or else but you need to write the script to convert the proper acoustic and feature for simple asr example or interactive asr example
The StreamingTDSModelConverter code base could be a good starting point
Inference modules are a bit simpler than training modules. They do not need support for auto grad, back propagation and GPU. Instead they architected for streaming and backend flexibility. Since the modules are a bit different, the serialized format is a bit different as well. As Sakares mentioned above we have the streaming_tds_model_converter tool to convert serialized training module to a module to be used for inference. See usage details at: https://github.com/facebookresearch/wav2letter/blob/2ee1d58d6c39bbe583773185dd6df90cb4d4c474/tools/README.md#streaming-tds-model-conversion-for-running-inference-pipeline
Update: After trying to make the StreamingTDSModelConverter, I have made a workaround step by copying
inference
directory tosrc
with the following script:export KENLM_ROOT_DIR=/root/kenlm && \ cd /root/ && \ cp -rf wav2letter/inference wav2letter/src/ && \ mkdir wav2letter/build_tools_test && cd wav2letter/build_tools_test && \ cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_LIBRARIES_USE_CUDA=OFF -DW2L_BUILD_INFERENCE=ON -DW2L_BUILD_TOOLS=ON && \ make streaming_tds_model_converter
since the wav2letter/tools/StreamingTDSModelConverter.cpp notifed about missing feature.h during making phase.
Note: about to reproduce the streaming_convnets on the docker
wav2letter/wav2letter:cpu-latest
Thank you for your script!
After the compilation I had the error:
streaming_tds_model_converter: error while loading shared libraries: libmkl_rt.so: cannot open shared object file: No such file or directory
This thing helped, maybe it will help for somebody:
export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64:$LD_IBRARY_PATH
But, I downloaded pre-trained streaming_convnets model and tried to convert it, but received error. Could you please share your experience, how did you convert the model?
root@a5d94a65e709:~/wav2letter/build_tools_test/tools# ./streaming_tds_model_converter -am /root/host/model/ --outdir /root/host/out_model/
E0723 11:47:21.968611 3031 Serial.h:77] Error while loading "/root/host/model/": basic_filebuf::underflow error reading the file: iostream error
E0723 11:47:22.974591 3031 Serial.h:77] Error while loading "/root/host/model/": basic_filebuf::underflow error reading the file: iostream error
E0723 11:47:24.978430 3031 Serial.h:77] Error while loading "/root/host/model/": basic_filebuf::underflow error reading the file: iostream error
E0723 11:47:28.983088 3031 Serial.h:77] Error while loading "/root/host/model/": basic_filebuf::underflow error reading the file: iostream error
E0723 11:47:36.985092 3031 Serial.h:77] Error while loading "/root/host/model/": basic_filebuf::underflow error reading the file: iostream error
E0723 11:47:52.987134 3031 Serial.h:77] Error while loading "/root/host/model/": basic_filebuf::underflow error reading the file: iostream error
terminate called after throwing an instance of 'std::__ios_failure'
what(): basic_filebuf::underflow error reading the file: iostream error
*** Aborted at 1595504872 (unix time) try "date -d @1595504872" if you are using GNU date ***
PC: @ 0x7fac3cd69e97 gsignal
*** SIGABRT (@0xbd7) received by PID 3031 (TID 0x7fac4819e840) from PID 3031; stack trace: ***
@ 0x7fac3e570890 (unknown)
@ 0x7fac3cd69e97 gsignal
@ 0x7fac3cd6b801 abort
@ 0x7fac3d75e957 (unknown)
@ 0x7fac3d764ae6 (unknown)
@ 0x7fac3d764b21 std::terminate()
@ 0x7fac3d764da9 __cxa_rethrow
@ 0x55aa47b2a91b (unknown)
@ 0x7fac3cd4cb97 __libc_start_main
@ 0x55aa47b9697a (unknown)
Aborted
Here you need to specify full path to the model, not just dir where it is - --am /root/host/model/model.bin
Here you need to specify full path to the model, not just dir where it is -
--am /root/host/model/model.bin
Thank you for your help.
I have such error Invalid dictionary filepath specified tokens.txt
root@1e4cb9f61569:~/host# ./streaming_tds_model_converter -am /root/host/model/am_500ms_future_context_dev_other.bin --outdir /root/host/out_model/
./streaming_tds_model_converter: error while loading shared libraries: libmkl_rt.so: cannot open shared object file: No such file or directory
root@1e4cb9f61569:~/host# export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64:$LD_IBRARY_PATH
root@1e4cb9f61569:~/host# ./streaming_tds_model_converter -am /root/host/model/am_500ms_future_context_dev_other.bin --outdir /root/host/out_model/
I0724 09:58:19.420972 18 StreamingTDSModelConverter.cpp:174] Gflags after parsing
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/host/model/am_500ms_future_context_dev_other.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=arch.txt; --archdir=; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=8; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/checkpoint/antares/wav2letter/recipes/models/seq2seq_tds/librispeech/am/librispeech-train+dev-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=300; --localnrmlrightctx=0; --logadd=false; --lr=0.40000000000000002; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0.5; --maxisz=33000; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=200; --minrate=3; --minsil=0; --mintsz=2; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=1; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=2500; --rightWindowSize=50; --rndv_filepath=/checkpoint/vineelkpratap/experiments/speech/inference_tds//inference_paper_500ms_do0.1_lr0.4_G32_archtds_k10s_d8_p100m_do0.1_saug_mln_500ms.arch_bch8/rndvz.21621542; --rundir=/checkpoint/vineelkpratap/experiments/speech/inference_tds/; --runname=inference_paper_500ms_do0.1_lr0.4_G32_archtds_k10s_d8_p100m_do0.1_saug_mln_500ms.arch_bch8; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=; --tag=; --target=tkn; --test=; --tokens=tokens.txt; --tokensdir=; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst,/checkpoint/vineelkpratap/experiments/speech/librivox.cut.sub36s.datasets.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst,/checkpoint/antares/datasets/librispeech/lists/dev-other.lst; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=32; --outdir=/root/host/out_model/; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=;
terminate called after throwing an instance of 'std::runtime_error'
what(): Invalid dictionary filepath specified tokens.txt
*** Aborted at 1595584699 (unix time) try "date -d @1595584699" if you are using GNU date ***
PC: @ 0x7f3301a0ee97 gsignal
*** SIGABRT (@0x12) received by PID 18 (TID 0x7f330ce43840) from PID 18; stack trace: ***
@ 0x7f3303215890 (unknown)
@ 0x7f3301a0ee97 gsignal
@ 0x7f3301a10801 abort
@ 0x7f3302403957 (unknown)
@ 0x7f3302409ae6 (unknown)
@ 0x7f3302409b21 std::terminate()
@ 0x7f3302409d54 __cxa_throw
@ 0x56357f082330 (unknown)
@ 0x7f33019f1b97 __libc_start_main
@ 0x56357f0ee97a (unknown)
Aborted
Model folder is looks like
root@1e4cb9f61569:~/host# ll /root/host/model/
total 533888
drwxr-xr-x 9 root root 288 Jul 24 09:56 ./
drwxr-xr-x 6 root root 192 Jul 23 11:40 ../
-rw-r--r-- 1 root root 6148 Jul 23 14:13 .DS_Store
-rw-rw-rw- 1 root root 12746595 Jul 20 10:28 3-gram.pruned.3e-7.bin.qt
-rw-rw-rw- 1 root root 654 Jul 20 10:27 am_500ms_future_context.arch
-rw-rw-rw- 1 root root 460478967 Jul 20 10:28 am_500ms_future_context_dev_other.bin
-rw-rw-rw- 1 root root 42816162 Jul 20 10:28 decoder-unigram-10000-nbest10.lexicon
-rw-rw-rw- 1 root root 19537672 Jul 20 10:27 librispeech-train+dev-unigram-10000-nbest10.lexicon
-rw-rw-rw- 1 root root 82982 Jul 20 10:27 librispeech-train-all-unigram-10000.tokens
I renamed librispeech-train-all-unigram-10000.tokens
-> tokens.txt
and am_500ms_future_context.arch
-> arch.txt
and was able to build with this command
./streaming_tds_model_converter -am /root/host/model/am_500ms_future_context_dev_other.bin --outdir /root/host/out_model/ --tokensdir /root/host/model --archdir /root/host/model
And receive 3 files
253M Jul 24 15:02 acoustic_model.bin
130B Jul 24 15:02 feature_extractor.bin
81K Jul 24 15:02 tokens.txt
But when I was trying to run it with simple_streaming_asr_example
I receive
failed to open decoder options file=/root/host/converted_streaming_model/decoder_options.json for reading
Where or how should I get decoder_options.json?
I used streaming_convnets and I think that it has only acoustic model
In the inference it is running not only acoustic model forward but also beam-search decoding with some n-gram LM. Please have a look on tutorial here https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples.\
for streaming convnets we also have beam search decoding on top of the acoustic model, https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/streaming_convnets/librispeech/decode_500ms_right_future_ngram_other.cfg.
In the inference it is running not only acoustic model forward but also beam-search decoding with some n-gram LM. Please have a look on tutorial here https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples.\
for streaming convnets we also have beam search decoding on top of the acoustic model, https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/streaming_convnets/librispeech/decode_500ms_right_future_ngram_other.cfg.
Hi, thank you for your help.
I took Language model 3-gram.pruned.3e-7.bin.qt
link, renamed it to language_model.bin
and I made decoder_option.json
with such content
{
"am": "acoustic_model.bin",
"tokensDir": ".",
"tokens": "tokens.txt",
"lexicon": "lexicon.txt",
"useLexicon": true,
"decoderType": "wrd",
"lmType": "kenlm",
"lmWeight": 0.674,
"wordScore" : 0.628,
"unkScore" : -Infinity,
"silScore": 0,
"beamSize": 100,
"beamSizeToken": 100,
"beamThreshold": 100,
"nthread_decoder": 8,
"smearing": "max",
"eosScore" : 0.0,
"logAdd" : false,
"criterionType" : "CTC"
}
And trued to start simple_streaming_asr_example
with my converted model and had error:
Started converting audio input from stdin to text... ...
Creating LexiconDecoder instance.
#start (msec), end(msec), transcription
terminate called after throwing an instance of 'std::invalid_argument'
what(): size must be devisible in alphabet size in Decoder::run(input=59988, size=59988) alphabet size=9997
Aborted
Could you please advice, where I'm wrong
cc @avidov @vineelpratap
Thanks so much for an amazing repository. Currently, I have done on training my own acoustic model from here and it's a simple architecture from the tutorial here: wav2letter/tutorials/1-librispeech_clean/network.arch
Also, I have reproduced this step successfully from AWS S3
acoustic_model.bin
One thing I am struggling with on how to use my own trained acoustic model from the following code:
gave result
Is it different training recipe between
acoustic_model.bin
and my own AM model?How can I inference with my own AM model?