Decoder Aborted: SIGSEGV

davidbelle commented 5 years ago

Hello! wav2letter looks amazing, can't wait to get it working. I'm running Ubuntu 16.04, CUDA 10.0 with a T4. Built everything and have been following the 1-librispeech_clean tutorial. I'm up to decoding, it looks like it is working right up till the last second. Here's my decode.cfg file I'm loading

# Decoding config for Mini Librispeech
# Replace `[...]` with appropriate paths
--lexicon=/home/ubuntu/w2l/lm/lexicon.txt
--lm=/home/ubuntu/w2l/lm/3-gram.arpa
--am=/home/ubuntu/w2l/save/librispeech_clean_trainlogs/001_model_lists#dev-clean.lst.bin
--test=lists/test-clean.lst
--sclite=/home/ubuntu/w2l/save/logs
--lmweight=2.5
--wordscore=1
--beamsize=500
--beamthreshold=25
--silweight=-0.5
--nthread_decoder=4
--smearing=max
--show=true

And here's the output

./Decoder --flagsfile /home/ubuntu/wav2letter/tutorials/1-librispeech_clean/decode.cfg
Falling back to using letters as targets for the unknown word: valleyed
Falling back to using letters as targets for the unknown word: woodbegirt
Falling back to using letters as targets for the unknown word: citadelled
Falling back to using letters as targets for the unknown word: dedalos
Falling back to using letters as targets for the unknown word: hazewrapped

[ says this a bunch of times with various words, and then.....]

Falling back to using letters as targets for the unknown word: innerlochy
Skipping unknown entry: 'innerlochy'
Loading the LM will be faster if you build a binary file.
Reading /home/ubuntu/w2l/lm/3-gram.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
*** Aborted at 1573022859 (unix time) try "date -d @1573022859" if you are using GNU date ***
PC: @     0x7fb8bea5fe10 (unknown)
*** SIGSEGV (@0x0) received by PID 7928 (TID 0x7fb8ec700380) from PID 0; stack trace: ***
    @     0x7fb8e4491390 (unknown)
    @     0x7fb8bea5fe10 (unknown)
    @     0x7fb8bea62184 __libc_malloc
    @     0x7fb8bf354e78 operator new()
    @           0x5343b0 w2l::Trie::insert()
    @           0x413eb9 main
    @     0x7fb8be9fe830 __libc_start_main
    @           0x477159 _start
    @                0x0 (unknown)

The line of asterix below the 5--10-----100 gradually moves till it get's to the 100 so it looks it's the very last step.

If it helps, I built ArrayFire from source, because I read a comment that CUDA 9 and below doesn't play well with the T4. Not sure where the problem lies. Thanks for any help.

lunixbochs commented 5 years ago

Did you run out of ram?

davidbelle commented 5 years ago

Just checked, plenty of ram. There's 32 GB of ram and running the decoder barely uses any of it. CPU works hard right up until the ----5---10---------100 stage.

Could it have something to do with google logging itself? Even just running ./Test I get this output.

F1109 01:03:32.593775  3093 Test.cpp:38] Usage: Please refer to https://git.io/fjVVq
*** Check failure stack trace: ***
    @     0x7fb8ab8f85cd  google::LogMessage::Fail()
    @     0x7fb8ab8fa433  google::LogMessage::SendToLog()
    @     0x7fb8ab8f815b  google::LogMessage::Flush()
    @     0x7fb8ab8fae1e  google::LogMessageFatal::~LogMessageFatal()
    @           0x414c7b  main
    @     0x7fb8aaca2830  __libc_start_main
    @           0x476299  _start
    @              (nil)  (unknown)
Aborted (core dumped)

Does it log something right at the end but can't? Tried running sudo as well.

davidbelle commented 5 years ago

Here's the output of nvidia-smi when it crashes.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   42C    P0    28W /  70W |    533MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3098      C   ./Decoder                                    523MiB |
+-----------------------------------------------------------------------------+

davidbelle commented 5 years ago

Not sure if this helps either, but I ran it through gdb and this is the output.

....
Skipping unknown entry: 'innerlochy'
[Thread 0x7fff83fff700 (LWP 8344) exited]
[Thread 0x7fffa0ffd700 (LWP 8343) exited]
[Thread 0x7fffa17fe700 (LWP 8342) exited]
[Thread 0x7fffa1fff700 (LWP 8341) exited]
Loading the LM will be faster if you build a binary file.
Reading /home/ubuntu/w2l/lm/3-gram.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************

Thread 1 "Decoder" received signal SIGSEGV, Segmentation fault.
_int_malloc (av=av@entry=0x7fffca651b20 <main_arena>, bytes=bytes@entry=136) at malloc.c:3516
3516    malloc.c: No such file or directory.
(gdb) where
#0  _int_malloc (av=av@entry=0x7fffca651b20 <main_arena>, bytes=bytes@entry=136) at malloc.c:3516
#1  0x00007fffca311184 in __GI___libc_malloc (bytes=136) at malloc.c:2913
#2  0x00007fffcac03e78 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00000000005343b0 in ?? ()
#4  0x00007fffffffd110 in ?? ()
#5  0x0a1672dfa8035800 in ?? ()
#6  0x0000000018edf030 in ?? ()
#7  0x00000000c0f3e2ae in ?? ()
#8  0x0000000018edf0d0 in ?? ()
#9  0x0000001b18edf0d0 in ?? ()
#10 0x0000000018edf0d0 in ?? ()
#11 0x00007fffffffce60 in ?? ()
#12 0x00007fffffffd110 in ?? ()
#13 0x00000000005166aa in ?? ()
#14 0x0000000000000000 in ?? ()

I then ran it through valgrind as lots of forums suggested. There was a lot of output but I believe this is the only relevant part:

==8878== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==8878==    at 0x326BB4D9: syscall (syscall.S:38)
==8878==    by 0xC9D7E97: __kmp_affinity_determine_capable (in /usr/local/lib/libiomp5.so)
==8878==    by 0xC9B3867: __kmp_env_initialize(char const*) (in /usr/local/lib/libiomp5.so)
==8878==    by 0xC99CA34: _INTERNAL_25_______src_kmp_runtime_cpp_d89aedeb::__kmp_do_serial_initialize() (in /usr/local/lib/libiomp5.so)
==8878==    by 0xC99027F: __kmp_get_global_thread_id_reg (in /usr/local/lib/libiomp5.so)
==8878==    by 0xC986512: GOMP_parallel@@VERSION (in /usr/local/lib/libiomp5.so)
==8878==    by 0x53FDD6: w2l::PowerSpectrum<float>::batchApply(std::vector<float, std::allocator<float> > const&, long) (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x57741C: w2l::featurize(std::vector<w2l::W2lLoaderData, std::allocator<w2l::W2lLoaderData> > const&, std::unordered_map<int, w2l::Dictionary, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, w2l::Dictionary> > > const&) (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x583D74: w2l::W2lDataset::getFeatureData(long) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x585318: w2l::W2lDataset::getFeatureDataAndPrefetch(long) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x58564D: w2l::W2lDataset::get(long) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x41424A: main (in /home/ubuntu/wav2letter/build/Decoder)
==8878==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==8878== 
vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0x1E 0xFA 0x50 0x33 0xC0 0xE8
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=1
==8878== valgrind: Unrecognised instruction at address 0x87b49f0.
==8878==    at 0x87B49F0: __intel_mkl_features_init_x (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_core.so)
==8878==    by 0x6E53225: mkl_serv_get_num_stripes (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gnu_thread.so)
==8878==    by 0x6FA4671: mkl_blas_sgemm (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gnu_thread.so)
==8878==    by 0x634DE26: SGEMM (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so)
==8878==    by 0x63B48C0: cblas_sgemm (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so)
==8878==    by 0x542F5C: std::vector<float, std::allocator<float> > w2l::cblasGemm<float>(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int) (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x543806: w2l::TriFilterbank<float>::apply(std::vector<float, std::allocator<float> > const&, float) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x53E70E: w2l::Mfsc<float>::mfscImpl(std::vector<float, std::allocator<float> >&) (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x53E88B: w2l::Mfsc<float>::apply(std::vector<float, std::allocator<float> > const&) (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0x53F7A1: w2l::PowerSpectrum<float>::batchApply(std::vector<float, std::allocator<float> > const&, long) [clone ._omp_fn.0] (in /home/ubuntu/wav2letter/build/Decoder)
==8878==    by 0xC986637: GOMP_parallel@@VERSION (in /usr/local/lib/libiomp5.so)
==8878==    by 0x53FDD6: w2l::PowerSpectrum<float>::batchApply(std::vector<float, std::allocator<float> > const&, long) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== Your program just tried to execute an instruction that Valgrind
==8878== did not recognise.  There are two possible reasons for this.
==8878== 1. Your program has a bug and erroneously jumped to a non-code
==8878==    location.  If you are running Memcheck and you just saw a
==8878==    warning about a bad jump, it's probably your program's fault.
==8878== 2. The instruction is legitimate but Valgrind doesn't handle it,
==8878==    i.e. it's Valgrind's fault.  If you think this is the case or
==8878==    you are not sure, please let us know and we'll try to fix it.
==8878== Either way, Valgrind will now raise a SIGILL signal which will
==8878== probably kill your program.

Having said that, that may not be related. gdb crashed at the same place, but valgrind crashed prior to the ---5---10 etc. As you can tell by the text, valgrind is sending a kill signal due to what it thinks is a problem. Not sure if any of this helps, let me know if I can send you any output that might?

davidbelle commented 5 years ago

Well I'm about ready to give up, I just don't know what I'm doing wrong. I've tried building wav2letter on all sorts of combinations, mostly between ubuntu 16 and 14, CUDA 9.x,10.x, using pre-built ArrayFire binaries vs building ArrayFire myself, the list goes on and on. I've attempted 12 times. What am I doing wrong?

Basic Requirements. Needs to run in AWS. AWS have the GPU platforms, I've chosen a g4dn.2xlarge instance which gives ma a T4 with 32 GB of memory. They have the option of plain old Ubuntu 16 and 18 with or without Deep Learning Base (NVIDIA drivers pre installed, CUDA pre installed, some other stuff) OR Deep Learning (Base + more things).

Am I better of not using the Deep Learning images and installing CUDA etc manually? (which I tried but I could try again, maybe with more success). Is there a magic combination?

The ultimate goal is to get the transcription of an WAV file quickly, with most being under 15 seconds long.

Please give me advice, I'm at my whits end. Very frustrated!

tlikhomanenko commented 5 years ago

Hi @davidbelle,

I think the problem is in kenlm model loading. The error looks like it crashes before https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L308 and during https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L286. Could you try to use our docker image to run it? Or could you try to install kenlm exactly as in the Dockerfile https://github.com/facebookresearch/wav2letter/blob/master/Dockerfile-CUDA-Base#L79?

davidbelle commented 5 years ago

THANK YOU @tlikhomanenko Started down toward that path, found this post and have finally made some progress.

https://github.com/facebookresearch/wav2letter/issues/335

Downloaded the CUDA10 Docker, git pulled on arrayfire and wav2letter, rebuilt them and everything seems to be working great. Thank you!!!!

erickim555 commented 2 years ago

Regarding this valgrind "unrecognized __intel_mkl_features_init_x instruction" error message:

vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0x1E 0xFA 0x50 0x33 0xC0 0xE8
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=1
==8878== valgrind: Unrecognised instruction at address 0x87b49f0.
==8878==    at 0x87B49F0: __intel_mkl_features_init_x (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_core.so)

I ran into this error in an unrelated C++ project, and I was able to fix this by building+installing valgrind from source. My guess is that the valgrind binary that Ubuntu has (eg via apt-get install valgrind) isn't by default built to support certain CPU instructions that Intel MKL requires. Thus, one needs to build from source so that valgrind can "pick up" these special CPU instructions:

# Build + install valgrind from source
mkdir -p /opt/valgrind && \
    curl -O https://sourceware.org/pub/valgrind/valgrind-3.18.1.tar.bz2 && \
    tar -xf valgrind-3.18.1.tar.bz2 && \
    cd valgrind-3.18.1 && \
    ./configure && \
    make -j16 && \
    make install -j16

Hopefully this learning will help another person in the future!

flashlight / wav2letter

Decoder Aborted: SIGSEGV #441