daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
339 stars 51 forks source link

KaldiAG 2.0.0 crashes with "Illegal instruction" on Linux #50

Closed p-e-w closed 3 years ago

p-e-w commented 3 years ago
  1. Create a Linux VM using QEMU on a Linux host (I use Fedora 33 as both guest and host, and GNOME Boxes to create the VM)
  2. Install KaldiAG 2.0.0 with the "big" language model
  3. Run full_example.py from the examples directory

KaldiAG crashes with

Illegal instruction (core dumped)

This does not happen with KaldiAG 1.8.0 on the same VM.

Running the example under gdb, I get

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffd2434278 in ddot_k ()
   from [...]/kaldi_active_grammar/exec/linux/../../../kaldi_active_grammar.libs/libkaldi-matrix-01efe3e4.so

I don't have debug symbols, so that's as deep as I can go without building everything from scratch.

My guess is that for some reason, OpenBLAS is misdetecting what instruction set extensions are available, and tries to use architecture-specific optimized code that the (virtual) CPU cannot execute. If that is the case, there is probably a bug in how the KaldiAG wheels are built though, because OpenBLAS certainly works on VMs (NumPy would just fall apart otherwise).

daanzu commented 3 years ago

Hmm, I will look into it. What does your /proc/cpuinfo flags field read?

p-e-w commented 3 years ago

Guest flags:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush 
mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon 
rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 
pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx 
f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd 
ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 
hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec 
xgetbv1 xsaves arat umip md_clear arch_capabilities

Host flags:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush 
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf 
pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm 
pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp 
tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 
xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

Model:

Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz

Hardware virtualization is enabled.

daanzu commented 3 years ago

Thanks. I forgot to ask: do you know if it works running natively on the host?

p-e-w commented 3 years ago

Whoa, stop the press! The same crash indeed happens on the host as well. So the VM was just a red herring here and the actual problem appears to be on Linux in general, at least with the CPU model that I have.

But NumPy, linked against OpenBLAS, works just fine on the same system. I regularly use a variety of other math/science software as well, and I'm pretty sure most if not all of it also depends on OpenBLAS. So while the "Illegal instruction" error certainly sounds like the wrong architecture-specific code is being executed, and that appears to point straight at OpenBLAS, things probably aren't quite so simple.

shervinemami commented 3 years ago

It's also possible that some parts of OpenBLAS or Numpy work on your machine and some parts of those libraries cause illegal instructions, because it probably comes down to the compiler having generated an SSE instruction that's either too new or too old for your machine. But it's possible that for example, value * 2 works in Numpy but value / 2 causes an illegal instruction. (That was a totally arbitrary example, just to show that a library might work most of the time even if it has some illegal instructions that you happened to not make use of those code paths)

daanzu commented 3 years ago

Hmm, interesting. I made sure to test some on an old CPU without AVX, which worked fine, so this is curious. I will investigate.

p-e-w commented 3 years ago

I just ran the NumPy and SciPy test suites with

$ pip install numpy scipy
$ python
>>> import numpy, scipy
>>> numpy.test()
>>> scipy.test()

More than 42,000 tests executed in total, and none of them triggered an illegal instruction. The OpenBLAS API surface isn't that large and it's extremely difficult to imagine that KaldiAG calls a function that all of these tests somehow miss. The problem must surely lie with how KaldiAG builds OpenBLAS, or it might be entirely unrelated to OpenBLAS.

As part of the tests, NumPy printed the following which may or may not be of interest:

NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* 
F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? 
AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
p-e-w commented 3 years ago

I also confirmed (using ldd) that NumPy links against libopenblasp-r0-5bebc122.3.13.dev.so. KaldiAG appears to statically link OpenBLAS, so I couldn't find out which version of OpenBLAS is contained in the KaldiAG wheels.

Note also the recently filed OpenBLAS issue "DGEMM: illegal instruction on old x86 processor built with DYNAMIC_ARCH", though my CPU isn't anywhere near as ancient as the one mentioned there.

daanzu commented 3 years ago

@p-e-w Can you try with this version? Uninstall the public release first of course, then rename this file as a .whl and install it.

kaldi_active_grammar-2.0.0-py2.py3-none-linux_x86_64.zip

p-e-w commented 3 years ago

Something seems to be missing from the wheel. I get the following error:

OSError: cannot load library './venv/lib64/python3.9/site-packages/kaldi_active_grammar/exec/linux/libkaldi-dragonfly.so': 
libfstscript.so.10: cannot open shared object file: No such file or directory.  Additionally, ctypes.util.find_library() did not manage 
to locate a library called './venv/lib64/python3.9/site-packages/kaldi_active_grammar/exec/linux/libkaldi-dragonfly.so'

libkaldi-dragonfly.so is present in the location indicated by the error message, but I cannot find libfstscript.so.10 at all.

daanzu commented 3 years ago

@p-e-w Gah, sorry, foolish mistake on my part. Try again with this.

kaldi_active_grammar-2.0.1-py2.py3-none-manylinux2010_x86_64.whl.zip

p-e-w commented 3 years ago

@daanzu This works, thank you! No more crash, and basic speech recognition confirmed working.

You might want to consider switching from TARGET=NEHALEM to TARGET=GENERIC, which is what most repositories referenced in the OpenBLAS issue appear to be doing. Nehalem was released only in late 2008 and I don't think KaldiAG should crash without recourse or explanation just because someone has a 13-year-old computer.

daanzu commented 3 years ago

@p-e-w Thanks for the tip! Agreed; I had missed that option. The builds are now using that option, and are on PyPI for Linux/Mac.

p-e-w commented 3 years ago

@daanzu I tried again with the new release 2.0.2 from PyPI and everything still works as expected.

I also attempted to do some crude benchmarking using real-world conditions (full_example.py with microphone input; both idle and while speaking). Since input data varies so much, it's difficult to get reproducible results with such a setup. Nevertheless, these are the numbers I measured (CPU usage):

KaldiAG 1.8.0, idle:      26%
KaldiAG 1.8.0, speaking:  25%
KaldiAG 2.0.2, idle:      29%
KaldiAG 2.0.2, speaking:  27%

Repeated runs yield fluctuations of 2-3 percentage points, so it's safe to say that any difference is almost in the noise, even though I have an Intel CPU that should benefit the most from using the MKL instead of OpenBLAS.

daanzu commented 3 years ago

@p-e-w Thanks for the testing info. That agrees with my limited testing as well, where the run-to-run variation was greater than the average difference between OpenBLAS and MKL. It's great to have such a performant open math library available now. Hopefully I can get it compiling and integrated well on Windows also.