Closed davidbelle closed 5 years ago
Did you run out of ram?
Just checked, plenty of ram. There's 32 GB of ram and running the decoder barely uses any of it. CPU works hard right up until the ----5---10---------100 stage.
Could it have something to do with google logging itself? Even just running ./Test I get this output.
F1109 01:03:32.593775 3093 Test.cpp:38] Usage: Please refer to https://git.io/fjVVq
*** Check failure stack trace: ***
@ 0x7fb8ab8f85cd google::LogMessage::Fail()
@ 0x7fb8ab8fa433 google::LogMessage::SendToLog()
@ 0x7fb8ab8f815b google::LogMessage::Flush()
@ 0x7fb8ab8fae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x414c7b main
@ 0x7fb8aaca2830 __libc_start_main
@ 0x476299 _start
@ (nil) (unknown)
Aborted (core dumped)
Does it log something right at the end but can't? Tried running sudo as well.
Here's the output of nvidia-smi when it crashes.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 42C P0 28W / 70W | 533MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3098 C ./Decoder 523MiB |
+-----------------------------------------------------------------------------+
Not sure if this helps either, but I ran it through gdb and this is the output.
....
Skipping unknown entry: 'innerlochy'
[Thread 0x7fff83fff700 (LWP 8344) exited]
[Thread 0x7fffa0ffd700 (LWP 8343) exited]
[Thread 0x7fffa17fe700 (LWP 8342) exited]
[Thread 0x7fffa1fff700 (LWP 8341) exited]
Loading the LM will be faster if you build a binary file.
Reading /home/ubuntu/w2l/lm/3-gram.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Thread 1 "Decoder" received signal SIGSEGV, Segmentation fault.
_int_malloc (av=av@entry=0x7fffca651b20 <main_arena>, bytes=bytes@entry=136) at malloc.c:3516
3516 malloc.c: No such file or directory.
(gdb) where
#0 _int_malloc (av=av@entry=0x7fffca651b20 <main_arena>, bytes=bytes@entry=136) at malloc.c:3516
#1 0x00007fffca311184 in __GI___libc_malloc (bytes=136) at malloc.c:2913
#2 0x00007fffcac03e78 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00000000005343b0 in ?? ()
#4 0x00007fffffffd110 in ?? ()
#5 0x0a1672dfa8035800 in ?? ()
#6 0x0000000018edf030 in ?? ()
#7 0x00000000c0f3e2ae in ?? ()
#8 0x0000000018edf0d0 in ?? ()
#9 0x0000001b18edf0d0 in ?? ()
#10 0x0000000018edf0d0 in ?? ()
#11 0x00007fffffffce60 in ?? ()
#12 0x00007fffffffd110 in ?? ()
#13 0x00000000005166aa in ?? ()
#14 0x0000000000000000 in ?? ()
I then ran it through valgrind as lots of forums suggested. There was a lot of output but I believe this is the only relevant part:
==8878== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==8878== at 0x326BB4D9: syscall (syscall.S:38)
==8878== by 0xC9D7E97: __kmp_affinity_determine_capable (in /usr/local/lib/libiomp5.so)
==8878== by 0xC9B3867: __kmp_env_initialize(char const*) (in /usr/local/lib/libiomp5.so)
==8878== by 0xC99CA34: _INTERNAL_25_______src_kmp_runtime_cpp_d89aedeb::__kmp_do_serial_initialize() (in /usr/local/lib/libiomp5.so)
==8878== by 0xC99027F: __kmp_get_global_thread_id_reg (in /usr/local/lib/libiomp5.so)
==8878== by 0xC986512: GOMP_parallel@@VERSION (in /usr/local/lib/libiomp5.so)
==8878== by 0x53FDD6: w2l::PowerSpectrum<float>::batchApply(std::vector<float, std::allocator<float> > const&, long) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x57741C: w2l::featurize(std::vector<w2l::W2lLoaderData, std::allocator<w2l::W2lLoaderData> > const&, std::unordered_map<int, w2l::Dictionary, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, w2l::Dictionary> > > const&) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x583D74: w2l::W2lDataset::getFeatureData(long) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x585318: w2l::W2lDataset::getFeatureDataAndPrefetch(long) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x58564D: w2l::W2lDataset::get(long) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x41424A: main (in /home/ubuntu/wav2letter/build/Decoder)
==8878== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==8878==
vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0x1E 0xFA 0x50 0x33 0xC0 0xE8
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=1
==8878== valgrind: Unrecognised instruction at address 0x87b49f0.
==8878== at 0x87B49F0: __intel_mkl_features_init_x (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_core.so)
==8878== by 0x6E53225: mkl_serv_get_num_stripes (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gnu_thread.so)
==8878== by 0x6FA4671: mkl_blas_sgemm (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gnu_thread.so)
==8878== by 0x634DE26: SGEMM (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so)
==8878== by 0x63B48C0: cblas_sgemm (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so)
==8878== by 0x542F5C: std::vector<float, std::allocator<float> > w2l::cblasGemm<float>(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x543806: w2l::TriFilterbank<float>::apply(std::vector<float, std::allocator<float> > const&, float) const (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x53E70E: w2l::Mfsc<float>::mfscImpl(std::vector<float, std::allocator<float> >&) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x53E88B: w2l::Mfsc<float>::apply(std::vector<float, std::allocator<float> > const&) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0x53F7A1: w2l::PowerSpectrum<float>::batchApply(std::vector<float, std::allocator<float> > const&, long) [clone ._omp_fn.0] (in /home/ubuntu/wav2letter/build/Decoder)
==8878== by 0xC986637: GOMP_parallel@@VERSION (in /usr/local/lib/libiomp5.so)
==8878== by 0x53FDD6: w2l::PowerSpectrum<float>::batchApply(std::vector<float, std::allocator<float> > const&, long) (in /home/ubuntu/wav2letter/build/Decoder)
==8878== Your program just tried to execute an instruction that Valgrind
==8878== did not recognise. There are two possible reasons for this.
==8878== 1. Your program has a bug and erroneously jumped to a non-code
==8878== location. If you are running Memcheck and you just saw a
==8878== warning about a bad jump, it's probably your program's fault.
==8878== 2. The instruction is legitimate but Valgrind doesn't handle it,
==8878== i.e. it's Valgrind's fault. If you think this is the case or
==8878== you are not sure, please let us know and we'll try to fix it.
==8878== Either way, Valgrind will now raise a SIGILL signal which will
==8878== probably kill your program.
Having said that, that may not be related. gdb crashed at the same place, but valgrind crashed prior to the ---5---10 etc. As you can tell by the text, valgrind is sending a kill signal due to what it thinks is a problem. Not sure if any of this helps, let me know if I can send you any output that might?
Well I'm about ready to give up, I just don't know what I'm doing wrong. I've tried building wav2letter on all sorts of combinations, mostly between ubuntu 16 and 14, CUDA 9.x,10.x, using pre-built ArrayFire binaries vs building ArrayFire myself, the list goes on and on. I've attempted 12 times. What am I doing wrong?
Basic Requirements. Needs to run in AWS. AWS have the GPU platforms, I've chosen a g4dn.2xlarge instance which gives ma a T4 with 32 GB of memory. They have the option of plain old Ubuntu 16 and 18 with or without Deep Learning Base (NVIDIA drivers pre installed, CUDA pre installed, some other stuff) OR Deep Learning (Base + more things).
Am I better of not using the Deep Learning images and installing CUDA etc manually? (which I tried but I could try again, maybe with more success). Is there a magic combination?
The ultimate goal is to get the transcription of an WAV file quickly, with most being under 15 seconds long.
Please give me advice, I'm at my whits end. Very frustrated!
Hi @davidbelle,
I think the problem is in kenlm model loading. The error looks like it crashes before https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L308 and during https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L286. Could you try to use our docker image to run it? Or could you try to install kenlm exactly as in the Dockerfile https://github.com/facebookresearch/wav2letter/blob/master/Dockerfile-CUDA-Base#L79?
THANK YOU @tlikhomanenko Started down toward that path, found this post and have finally made some progress.
https://github.com/facebookresearch/wav2letter/issues/335
Downloaded the CUDA10 Docker, git pulled on arrayfire and wav2letter, rebuilt them and everything seems to be working great. Thank you!!!!
Regarding this valgrind "unrecognized __intel_mkl_features_init_x instruction" error message:
vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0x1E 0xFA 0x50 0x33 0xC0 0xE8
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=1
==8878== valgrind: Unrecognised instruction at address 0x87b49f0.
==8878== at 0x87B49F0: __intel_mkl_features_init_x (in /home/ubuntu/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_core.so)
I ran into this error in an unrelated C++ project, and I was able to fix this by building+installing valgrind from source. My guess is that the valgrind binary that Ubuntu has (eg via apt-get install valgrind
) isn't by default built to support certain CPU instructions that Intel MKL requires. Thus, one needs to build from source so that valgrind can "pick up" these special CPU instructions:
# Build + install valgrind from source
mkdir -p /opt/valgrind && \
curl -O https://sourceware.org/pub/valgrind/valgrind-3.18.1.tar.bz2 && \
tar -xf valgrind-3.18.1.tar.bz2 && \
cd valgrind-3.18.1 && \
./configure && \
make -j16 && \
make install -j16
Hopefully this learning will help another person in the future!
Hello! wav2letter looks amazing, can't wait to get it working. I'm running Ubuntu 16.04, CUDA 10.0 with a T4. Built everything and have been following the 1-librispeech_clean tutorial. I'm up to decoding, it looks like it is working right up till the last second. Here's my decode.cfg file I'm loading
And here's the output
The line of asterix below the 5--10-----100 gradually moves till it get's to the 100 so it looks it's the very last step.
If it helps, I built ArrayFire from source, because I read a comment that CUDA 9 and below doesn't play well with the T4. Not sure where the problem lies. Thanks for any help.