Closed gangliao closed 7 years ago
Is this a single occurence, or do you get similar reports from other users as well ? From your issue 248 I take it this is on an i5-2450M ("SandyBridge" core) running Ubuntu 16.04 - do you supply your own copy of OpenBLAS with your software or is this with whatever version 16.04 ships by default ? (Perhaps it might even make sense to have your user check (with update-alternatives) that it is actually OpenBLAS that they are using, rather than netlib or atlas ?) In my experience, SIGILL can mean either a genuine instruction that the cpu is not capable of handling (unlikely unless this is with an OpenBLAS that was built specifically for TARGET=HASWELL) or stack corruption creating absurd return addresses for a function. In the latter case it would help to know if the problem is reproducible on another system (preferably not using the same Ubuntu 16.04), or to have a minimal self-contained example (I assume your demo does a lot more than just the problematic sgemm call)
@martin-frbg This problem is from https://github.com/PaddlePaddle/book/issues/248.
Actually, This is a docker image, as the user said
I just test it on a Microsoft Azure Ubuntu 16.04 instance and it works.
It is most probably a missing instruction on my laptop.
For PaddlePaddle, we use external project to build OpenBlas https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/external/openblas.cmake#L48
How to build a more generic libopenblas.a?
That cmake file would indeed build an OpenBLAS that is tailored to the cpu of the build system. Please add "DYNAMIC_ARCH=1" to the build flags to get a (bigger) libopenblas.a with support for a range of x86 cpus (and builtin code to select the most appropriate one at runtime), or if library size is a concern build for the oldest, least sophisticated cpu you expect to encounter, e.g. TARGET=NEHALEM.
Thanks for your suggestion, really helpful!
You also need to install ubuntu cblas wrapper and probably some libblas-dev package so that padel build system detects cblas and skips making broken local build.
I examined other issue. Can you get /proc/cpuinfo (last core is enough) from inside particular docker container?
https://github.com/PaddlePaddle/Paddle/issues/1697 @brada4 Shall we also need to build cblas?
Invalid instruction comes from single-architecture build since sgemm_kernel is not present in DYNAMIC_ARCH build, there you find:
sgemm_kernel_ATOM
sgemm_kernel_BARCELONA
sgemm_kernel_BOBCAT
sgemm_kernel_BULLDOZER
sgemm_kernel_CORE2
sgemm_kernel_DUNNINGTON
sgemm_kernel_EXCAVATOR
sgemm_kernel_HASWELL
sgemm_kernel_NANO
sgemm_kernel_NEHALEM
sgemm_kernel_OPTERON
sgemm_kernel_OPTERON_SSE3
sgemm_kernel_PENRYN
sgemm_kernel_PILEDRIVER
sgemm_kernel_PRESCOTT
sgemm_kernel_SANDYBRIDGE
sgemm_kernel_STEAMROLLER
Building less saves your time building?
Since you install numpy (seen in your dockerfile) I would suggest to install libblas-dev libcblas? and libopenblas-dev (0.2.18 if you stay with ubuntu 16LTS) and select openblas as libblas.so.3 using update-alternatives. And check all build logs that you link libraries only to libblas (one that redirects, not one that is reference implementation) and not to any specific implementation of BLAS.
@brada4 I think the original issue is clear by now. I take it you want to discourage them from building OpenBLAS themselves, and rely on the older version provided by Ubuntu instead ? @gangliao which combination of options did you use for the docker build that failed (and if you used the same source tree as before, did you do a "make clean" first to remove potentially incompatible files from the previous build) ?
@martin-frbg indeed. They have ubuntu numpy, which means that they will have 2 BLAS implementations in same process (one built and other update-alternatives). I could add to FAQ how to add latest OpenBLAS to Debian and LTS alternatives - hmm?
Another thing - docker container is a virtual machine, hard to guess it is KVM or qemu emulator. (Hyper-V nesting KVM as on azure works fine already)
Hi,
We use the OpenBLAS as our third party libs in PaddlePaddle. Occasionally, some user has
sgemm_kernel
error when they execute our demo?Any idea how to fix it?
Thanks a lot.