flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

wav2letter lexfree Docker decoding on CPU #455

Closed oplatek closed 4 years ago

oplatek commented 4 years ago

What went well: I used the pre-trained models, the docker image wav2letter/wav2letter:lexfree I successfully launch the decoding past the point it loads the models but it seems the wav2letter/wav2letter:lexfree image assumes nvidia-docker, right?

What went wrong: I used regular docker version 18.09.4. And I omitted the --runtime=cuda from the command

docker run --runtime=nvidia --rm -itd --ipc=host --name lexfree wav2letter/wav2letter:lexfree

It resulted in error:

Running decoding using the downloaded models in dockerI1212 13:24:42.904512     1 Decode.cpp:57] Reading flags from file /root/data/oplatek-decoder_char_convlm_lexfree.cfg
I1212 13:24:42.904821     1 Decode.cpp:85] [Network] Reading acoustic model from /root/model/am/baseline_nov93dev.bin
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Unknown error:999):

In function void af::setDevice(int)
In file src/api/cpp/device.cpp:77
*** Aborted at 1576157082 (unix time) try "date -d @1576157082" if you are using GNU date ***
PC: @     0x7f8ff9c7b428 gsignal
*** SIGABRT (@0x1) received by PID 1 (TID 0x7f9043eec600) from PID 1; stack trace: ***
    @     0x7f903c1f2390 (unknown)
    @     0x7f8ff9c7b428 gsignal
    @     0x7f8ff9c7d02a abort
    @     0x7f8ffa5be84d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f8ffa5bc6b6 (unknown)
    @     0x7f8ffa5bc701 std::terminate()
    @     0x7f8ffa5bc919 __cxa_throw
    @     0x7f90161a9e31 af::setDevice()
    @           0x418c49 main
    @     0x7f8ff9c66830 __libc_start_main
    @           0x47b7f9 _start
    @                0x0 (unknown)

How can I the decode with the lexfree model using regular Docker on CPU?

Do I need to change the dockerfile to build its CPU version? If yes can you hint what is need to change in the https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/lexicon_free/Dockerfile ? I assume it is the Dockerfile used to build wav2letter/wav2letter:lexfreeimage.

tlikhomanenko commented 4 years ago

Hi @oplatek,

Really I built this image only with CUDA backend. To build it with CPU backend and reproduce everything from the paper (not to reproduce training of language models with fairseq) you should modify the Dockerfile https://github.com/facebookresearch/wav2letter/blob/master/recipes/models/lexicon_free/Dockerfile to the following:

FROM wav2letter/wav2letter:cpu-base-26c69be

# ==================================================================
# flashlight https://github.com/facebookresearch/flashlight.git
# ------------------------------------------------------------------
RUN cd /root && git clone --recursive https://github.com/facebookresearch/flashlight.git && \
    cd /root/flashlight && git checkout da99018f393c9301c9bb50908dabde954b290256 && \
    git submodule update --init --recursive && mkdir -p build && \
    export MKLROOT=/opt/intel/mkl && \
    cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DFLASHLIGHT_BACKEND=CPU && \
    make -j8 && make install && \
# ==================================================================
# kenlm rebuild with max order 20 and install python wrapper
# ------------------------------------------------------------------
    cd /root/kenlm/build && \
    cmake .. -DKENLM_MAX_ORDER=20 && make -j8 && make install && \
    cd /root/kenlm && \
    sed -i 's/DKENLM_MAX_ORDER=6/DKENLM_MAX_ORDER=20/g' setup.py && \
    pip install . && \
# ==================================================================
# wav2letter with CPU backend
# ------------------------------------------------------------------
    cd /root && git clone --recursive https://github.com/facebookresearch/wav2letter.git && \
    export KENLM_ROOT_DIR=/root/kenlm && \
    cd /root/wav2letter && git checkout 9bf4538 && mkdir -p build && cd build && \
    cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_LIBRARIES_USE_CUDA=OFF -DKENLM_MAX_ORDER=20 && \
    make -j8 && \
# ==================================================================
# sph2pipe
# ------------------------------------------------------------------
    cd /root && wget https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/ctools/sph2pipe_v2.5.tar.gz && \
    tar -xzf sph2pipe_v2.5.tar.gz && cd sph2pipe_v2.5 && \
    gcc -o sph2pipe *.c -lm

Also you can use regular CPU docker image version but please checkout flashlight to the da99018f393c9301c9bb50908dabde954b290256, w2l to the 9bf4538, and build with MAX_KENLM_ORDER=20 (you can check above how this can be done)

Let me know if this works for you!

oplatek commented 4 years ago

Hi @tlikhomanenko,

thank you for your help!

The docker image helped me launch the decoding but I still cannot decode on CPU with convLM.

I run this command with the config files and data prepared. I followed the README.

docker run --mount src=$(pwd)/model,target=/root/model,type=bind --mount src=$(pwd)/data,target=/root/data,type=bind  --rm --ipc=host --name lexfree-decoding wav2letter/wav2letter:lexfree-cpu \
  /root/wav2letter/build/Decoder \
    --flagsfile /root/data/${USER}-decoder_char_convlm_lexfree.cfg \
    --minloglevel=0 \
    --logtostderr=1   

which results into the error

...
...
I1213 10:43:40.525741     1 Decode.cpp:92] [Criterion] AutoSegmentationCriterion
I1213 10:43:40.525761     1 Decode.cpp:94] [Network] Number of params: 10116412
I1213 10:43:40.525770     1 Decode.cpp:100] [Network] Updating flags from config file: /root/model/am/baseline_nov93dev.bin
I1213 10:43:40.526813     1 Decode.cpp:112] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/model/am/baseline_nov93dev.bin; --arch=vital-half-kwinc-kwb13-kwe21_cpp_wn-s2; --archdir=/mnt/vol/gfsai-east/ai-group/users/locronan/wsj++/arch; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=16; --beamsize=500; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=/root/data/lists/; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=tkn; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=/root/data/oplatek-decoder_char_convlm_lexfree.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --hardselection=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/model/decoder/lexicon.lst; --linlr=-1; --linlrcrit=-1; --linseg=1; --lm=/root/model/decoder/convlm_models/lm_wsj_convlm_char_20B.bin; --lm_memory=3000; --lm_vocab=/root/model/decoder/convlm_models/lm_wsj_convlm_char_20B.vocab; --lmtype=convlm; --lmweight=1.7510731428777175; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=5.5999999999999996; --lrcrit=0.0080000000000000002; --maxdecoderoutputlen=200; --maxgradnorm=0.050000000000000003; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=2; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/mnt/vol/gfsai-east/ai-group/users/locronan/wsj++/runs/baseline-variants/chronos; --runname=baseline_lr5.6_lrcrit0.008_fb80_bsz16_archs2; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=/root/data/sclite; --seed=0; --show=true; --showletters=true; --silweight=-2.3696139062587926; --smearing=max; --smoothingtemperature=1; --softselection=inf; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=ltr; --test=nov92.lst; --tokens=tokens.lst; --tokensdir=/root/model/am; --train=si284; --trainWithWindow=false; --transdiag=5; --unkweight=-inf; --uselexicon=false; --usewordpiece=false; --valid=nov93dev,nov92; --weightdecay=0; --wordscore=2.9346358245918216; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=5; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I1213 10:43:40.526911     1 Decode.cpp:133] Number of classes (network): 31
I1213 10:43:40.898849     1 Decode.cpp:140] Number of words: 162533
Falling back to using letters as targets for the unknown word: martirosov
I1213 10:43:41.133837     1 W2lListFilesDataset.cpp:137] 333 files found. 
I1213 10:43:41.133868     1 Utils.cpp:102] Filtered 0/333 samples
I1213 10:43:41.133918     1 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 333
I1213 10:43:41.133985     1 Decode.cpp:154] [Serialization] Running forward pass ...
Falling back to using letters as targets for the unknown word: martirosov
Skipping unknown entry: 'martirosov'
I1213 10:44:26.121348     1 Decode.cpp:201] [Dataset] Number of samples per thread: 167
I1213 10:44:26.205636     1 Decode.cpp:292] [ConvLM]: Loading LM from /root/model/decoder/convlm_models/lm_wsj_convlm_char_20B.bin
[ConvLM]: Loading vocabulary from /root/model/decoder/convlm_models/lm_wsj_convlm_char_20B.vocab
[ConvLM]: vocabulary size of convLM 40
I1213 10:44:37.671041     1 Decode.cpp:308] [Decoder] LM constructed.
F1213 10:44:37.671173    46 Decode.cpp:357] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
I1213 10:44:37.671195    47 Decode.cpp:430] [Decoder] Lexicon-free decoder with token-LM loaded in thread: 0
    @     0x7f13fc3285cd  google::LogMessage::Fail()
    @     0x7f13fc32a433  google::LogMessage::SendToLog()
    @     0x7f13fc32815b  google::LogMessage::Flush()
    @     0x7f13fc32ae1e  google::LogMessageFatal::~LogMessageFatal()
    @           0x47dbac  _ZZ4mainENKUliiiE2_clEiii
    @           0x47e9aa  _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_ESt12_Bind_simpleIFSt17reference_wrapperISt5_BindIFZ4mainEUliiiE2_iiiEEEvEEvEEE9_M_invokeERKSt9_Any_data
    @           0x483f69  std::__future_base::_State_baseV2::_M_do_set()
    @     0x7f13fc55aa99  __pthread_once_slow
    @           0x47af31  _ZNSt13__future_base11_Task_stateISt5_BindIFZ4mainEUliiiE2_iiiEESaIiEFvvEE6_M_runEv
    @           0x488c8b  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN2fl10ThreadPoolC4EmRKSt8functionIFvmEEEUlvE_vEEE6_M_runEv
    @     0x7f13fc053c80  (unknown)
    @     0x7f13fc5536ba  start_thread
    @     0x7f13fb7b941d  clone
    @              (nil)  (unknown)
*** Aborted at 1576233877 (unix time) try "date -d @1576233877" if you are using GNU date ***
PC: @     0x7f13fb6e9196 abort
*** SIGSEGV (@0x0) received by PID 1 (TID 0x7f13cd3fa700) from PID 0; stack trace: ***
    @     0x7f13fc55d390 (unknown)
    @     0x7f13fb6e9196 abort
    @     0x7f13fc33112c (unknown)
    @     0x7f13fc3285cd google::LogMessage::Fail()
    @     0x7f13fc32a433 google::LogMessage::SendToLog()
    @     0x7f13fc32815b google::LogMessage::Flush()
    @     0x7f13fc32ae1e google::LogMessageFatal::~LogMessageFatal()
    @           0x47dbac _ZZ4mainENKUliiiE2_clEiii
    @           0x47e9aa _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_ESt12_Bind_simpleIFSt17reference_wrapperISt5_BindIFZ4mainEUliiiE2_iiiEEEvEEvEEE9_M_invokeERKSt9_Any_data
    @           0x483f69 std::__future_base::_State_baseV2::_M_do_set()
    @     0x7f13fc55aa99 __pthread_once_slow
    @           0x47af31 _ZNSt13__future_base11_Task_stateISt5_BindIFZ4mainEUliiiE2_iiiEESaIiEFvvEE6_M_runEv
    @           0x488c8b _ZNSt6thread5_ImplISt12_Bind_simpleIFZN2fl10ThreadPoolC4EmRKSt8functionIFvmEEEUlvE_vEEE6_M_runEv
    @     0x7f13fc053c80 (unknown)
    @     0x7f13fc5536ba start_thread
    @     0x7f13fb7b941d clone
    @                0x0 (unknown)

Based on the error it seems that the decoding with convlm assumes GPU See https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L357

Is the code for convlm "heavily" GPU dependent or it would be just too slow? (I would not mind a huge slow down right now and editing few lines of cpp code.)

PS: If I use decoder_char_15gram_lexfree.cfg or decoder_char_20gram_lexfree.cfg the decoding runs fine. IE the kenlm runs fine if it is used instead of convlm.

tlikhomanenko commented 4 years ago

@oplatek,

Right now we tested convlm only on GPU and in the implementation we are running ConvLM on the GPU only. You can try to remove the check here https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L362 (which is more reliable for GPUs). Let me know if this will work for you.

I am closing the issue for now. Feel free to reopen if it is needed.