IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

No BLAS installation detected and OpenMP not getting used #146

Closed janewliang closed 7 years ago

janewliang commented 7 years ago

I've tried building ParallelAccelerator on a Mac OS and Scientific Linux (derived from RHEL), and in both cases, ParallelAccelerator is unable to detect OpenBLAS (which I have installed).

So when calling Pkg.test("ParallelAccelerator") or attempting to use @acc, I get messages like "OpenMP is not used" and "MKL and OpenBLAS not found. Matrix multiplication might be slow".

I'm aware that other users have opened similar issues in the past. I've combed through the messages and tried whatever advice I could find, with no luck. All suggestions welcome.

julia> Pkg.build("ParallelAccelerator")
INFO: Building ParallelAccelerator
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
No BLAS installation detected (optional)
Using g++ to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.
lkuper commented 7 years ago

Hi @janewliang,

Unfortunately, our support for OpenBLAS is not nearly as good as our support for Intel MKL, particularly on Mac. I have some notes on how to get the ParallelAccelerator build process to detect OpenBLAS on Mac, but they are incomplete. We will work on getting this documented better. In the meantime, installing MKL may be the best BLAS library option.

OpenMP is a separate issue. Are you seeing "OpenMP is not used" both on Mac and on Linux?

janewliang commented 7 years ago

@lkuper I see "OpenMP is not used" on both Mac and Linux. If you're willing to share any of your notes on how to detect OpenBLAS, regardless of their completeness, that would be great too.

lkuper commented 7 years ago

@janewliang I've just pushed a patch that might fix your OpenMP issues. Please try the following and let me know if it works for you. I tested all this on Julia 5.1 on Ubuntu and macOS Sierra.

First. update ParallelAccelerator and CompilerTools to master. This will let you pick up some changes that have been made since the last release, as well as the changes I just made to the PA build script:

julia> Pkg.checkout("ParallelAccelerator")
julia> Pkg.checkout("CompilerTools")

Now make sure you have OpenBLAS installed. On Ubuntu, I ran sudo apt-get install libopenblas-dev. On macOS Sierra, I recommend installing OpenBLAS via Homebrew: brew install homebrew/science/openblas.

Important: the ParallelAccelerator build script will still not detect the presence of OpenBLAS unless you set a few environment variables. For instance, this is what I needed to do on macOS:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/opt/openblas/lib
export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/local/opt/openblas/include/

Then run Pkg.build("ParallelAccelerator") at the Julia prompt.

Finally, run Pkg.test("ParallelAccelerator"). You should not see the "Matrix multiplication might be slow" warnings this time.

Caveats:

janewliang commented 7 years ago

Thanks so much for the patch and the comments, @lkuper .

With the updates, I was able to detect OpenBLAS on MacOS Sierra. (There's still the OpenMP issue.)

Neither OpenBLAS nor OpenMP is working for me on Linux. I'm using Scientific Linux 7, which is based on RHEL. I have confirmed that OpenBLAS is installed and working with GCC 5.1.0.

lkuper commented 7 years ago

@janewliang I just put in another patch that tries to better detect compiler support for OpenMP. Can you try running Pkg.update() and then Pkg.build("ParallelAccelerator") again?

Here's what I now see on my Mac with Julia 0.5.2 and GCC 7.1.0 (installed using brew reinstall gcc --without-multilib).

julia> Pkg.build("ParallelAccelerator")
INFO: Building ParallelAccelerator
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
System installed BLAS found
Checking for OpenMP support...
OpenMP support found in g++
Max OpenMP threads: 8
Using g++ to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.

On both Mac and Linux, if you still see "OpenMP is not used" messages, can you check to see if OpenMP works outside of ParallelAccelerator? What happens if you try to compile and run a toy program like this:

#include <omp.h>
#include <stdio.h>
int main() {
    printf("Max OpenMP threads: %d\n", omp_get_max_threads());
}

If OpenMP is working, you should see something like this (depending on your machine):

$ g++ openmp_test.c -fopenmp -o openmp_test && ./openmp_test
Max OpenMP threads: 8
lkuper commented 7 years ago

As for OpenBLAS, I still don't know why it's not working for you on Linux. Can you compile and run a toy program like

#include <cblas.h>
int main() { return 0; }

using something like g++ -lblas blas_test.c -o blas_test && ./blas_test?

Gnimuc commented 7 years ago

On MacOS, gcc is just an alias for clang, the right gcc name should be gcc-N, where N is gcc's major version. I can successfully access OpenMP using this patch.

julia> versioninfo()
Julia Version 0.5.2
Commit f4c6c9d4bb (2017-05-06 16:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)

julia> Pkg.build("ParallelAccelerator")
INFO: Building ParallelAccelerator
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
System installed BLAS found
Checking for OpenMP support...
OpenMP support found in g++-7
Max OpenMP threads: 4
Using g++-7 to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.
lkuper commented 7 years ago

@Gnimuc The way I've been thinking about it is that if the user is savvy enough to install their own OpenMP-supporting GCC (using brew install gcc --without-multilib or the like), then they're also savvy enough to create aliases. I don't want ParallelAccelerator to make assumptions about what any given user has gcc/g++ aliased to. (Edit: it is also possible that someone has installed vanilla, non-Apple LLVM/Clang on their Mac, which has supported OpenMP since LLVM 3.7 in September 2015. So if someone has gcc/g++ pointing to that, then that ought to be OK, too.)

janewliang commented 7 years ago

@lkuper Thanks again for all of your support. ParallelAccelerator detects both OpenBLAS and OpenMP and passes all tests on my Mac now. I've checked that OpenBLAS and OpenMP both work on Linux, but I'm still having issues getting ParallelAccelerator to detect OpenBLAS. I probably won't have to time to investigate too deeply until tomorrow or maybe next week, but it's certainly possible that the problem is on my end.

janewliang commented 7 years ago

@lkuper

I got ParallelAccelerator to OpenBLAS on Linux and the build looks okay now. (There was some mismatch between older and newer compilations on my machine, so probably not something anybody else has to worry about.)

However (on Linux), I'm getting a failing test. It's not clear to me whether this has something to do with the package itself or with my installation of Julia, but it is quite similar to the error in Issue #127:

INFO: Testing ParallelAccelerator
Testing parallel library functions...
Done testing parallel library functions.
Testing parfor support via @par macro...
Done testing parfor.
Testing map and reduce...
Done testing map and reduce.
Testing abs()...
Done testing abs().
Testing constant promotion for pointwise operations...
Done testing constant promotion.
Testing rand()...
Done testing rand()...
Testing BitArrays...
Done testing BitArrays.
Testing ranges...
Done testing ranges.
Testing sequential code...
Done testing sequential code.
Testing cat...
Done testing cat.
Testing hcat...
Done testing hcat.
Testing vcat...
Done testing vcat.
Testing ranges...
Done testing ranges.
Testing miscellaneous features...
/usr/bin/julia: symbol lookup error: /tmp/tmphrwj9o/libcgen_output45.so.1.0: undefined symbol: cblas_dgemm
=====================================[ ERROR: ParallelAccelerator ]======================================

failed process: Process(`/usr/bin/julia -Cx86-64 -J/usr/lib64/julia/sys.so --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /home/XXXXX/.julia/v0.5/ParallelAccelerator/test/runtests.jl`, ProcessExited(127)) [127]

=========================================================================================================
lkuper commented 7 years ago

@janewliang I haven't been able to reproduce this latest Linux issue yet. Just so all the information is here, can you say what your Julia version is, what your GCC version is, and what the steps are that you took to install OpenBLAS? Thanks.

janewliang commented 7 years ago

@lkuper

Julia v0.5.2 GCC 5.1.0

I used yum to install OpenBLAS; some sleuthing suggests that the command was probably yum install openblas.x86_64, since that's the installed package I have.

lkuper commented 7 years ago

Just as a wild guess, does it help to do yum install openblas-devel.x86_64?

janewliang commented 7 years ago

@lkuper Sadly, no. But thanks for the thought!

lkuper commented 7 years ago

@janewliang Without access to the same OS/environment as you, I don't think I can fix the undefined symbol: cblas_dgemm problem. The only other thing I can think of to try is the LD_PRELOAD trick of setting LD_PRELOAD=/usr/lib/libopenblas.so or perhaps LD_PRELOAD=/usr/lib/libblas.so. No idea whether this will actually help, though.

janewliang commented 7 years ago

It doesn't appear to work. But thanks again for all of your support! Since the original issue has been resolved, I think you can close this now. I may tinker with my Linux install over the next few weeks and I'll let you know if I figure out what's going on.

lkuper commented 7 years ago

OK, closing this for now.