IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

Accessing OpenMP, MKL, and OpenBLAS #61

Closed albi3ro closed 7 years ago

albi3ro commented 8 years ago

When running the tests, I get OpenMP is not used. and Warning: MKL and OpenBLAS not found.

I'm running on Mac OSX El Capitan. I have gcc, and I just installed OpenBLAS, and the warnings are still there about OpenBLAS.
How can I make sure that those aspects are being used?

ehsantn commented 8 years ago

ParallelAccelerator looks for OPENBLAS_LIB at build time to find the OpenBLAS library path. Alternatively, you can set it manually in deps/config.jl.

Unfortunately, the default Clang/GCC installation on OS X doesn't provide OpenMP. You install full GCC manually, and set environment variable CGEN_NO_OMP=0.

leonardt commented 8 years ago

The ParallelAccelerator method for discovering blas installations is very simple (scanning LD_LIBRARY_PATH) and will miss system installed packages (i.e. with apt-get install libopenblas-dev).

We could use something like Cmake or perhaps parsing the output of a command like ldconfig -p | grep blas) to make it more robust.

lkuper commented 8 years ago

Commit https://github.com/IntelLabs/ParallelAccelerator.jl/commit/d72eed5aa90dbfcf25af70cd980bf391743f9ff9 addresses an issue where DYLD_LIBRARY_PATH wasn't being seen on OS X 10.11 specifically. @albi3ro, can you run Pkg.checkout("ParallelAccelerator"), then Pkg.build("ParallelAccelerator") and see if OpenBLAS is found this time?

If it doesn't work, can you post the output of echo $DYLD_LIBRARY_PATH? You may have to do something like export DYLD_LIBRARY_PATH=/opt/OpenBLAS/lib if you're not already doing so.

ehsantn commented 8 years ago

CGen now looks for system installed BLAS (i.e. -lblas works). On Ubuntu, users can install OpenBLAS using sudo apt-get install libopenblas-dev libblas-dev. If OpenBLAS is built manually, the C++ compiler should be able to find both the header file and the library file.

lkuper commented 8 years ago

@albi3ro Did the previous suggestion help? If not, feel free to reopen this issue.

Ken-B commented 8 years ago

I'm on OSX 10.11. I successfully installed OpenMP with brew install gcc --without-multilib and OpenBLAS with brew install homebrew/science/openblas.

However, I can't get Pkg.build (on latest master) to find either OpenMP nor OpenBLAS. I also tried adding the different paths above, but still no success. (You probably have good reasons for not using Julia's OpenBLAS I guess).

Could you help me out? Let me know what I can do to better identify the issue. Thanks!

lkuper commented 8 years ago

@Ken-B What does echo $DYLD_LIBRARY_PATH say? Also, make sure you have the most recent version of the package by running Pkg.checkout("ParallelAccelerator").

ulneva commented 8 years ago

I am also on El Capitan and just installed openBLAS and openMP via Homebrew. I used Pkg.checkout("ParallelAccelerator") to install and then build the package, but the package test says it cannot find openBLAS and openMP is not used. The Black-Scholes example in this situation does not show any improvement when used with @acc. The echo $DYLD_LIBRARY_PATH gives back an empty line, just white space. Can you please help?

Ken-B commented 8 years ago

After checking out latest master of ParallelAccelerator I get:

shell> echo $DYLD_LIBRARY_PATH
ERROR: UndefVarError: DYLD_LIBRARY_PATH not defined

@lkuper Could you reopen the issue? Thank you for your effort on looking into this!

lkuper commented 8 years ago

It looks like a lot of people are having problems getting ParallelAccelerator to detect OpenBLAS and OpenMP on Mac. @IntelLabs/team-hps What can we do to make this easiest for users? Could we have build.jl detect if the platform is Mac, install them via Homebrew.jl, and set the necessary env vars? @leonardt @ehsantn Thoughts?

ulneva commented 8 years ago

Thanks for reopening the issue. Would you recommend something that I can do right now to make ParallelAccelerator work, before the changes are made to the package? I'd really appreciate it since it might allow me to move my project earlier!

Regards, Yulia

On Mon, May 30, 2016 at 7:05 PM, Lindsey Kuper notifications@github.com wrote:

It looks like a lot of people are having problems getting ParallelAccelerator to detect OpenBLAS and OpenMP on Mac. @IntelLabs/team-hps https://github.com/orgs/IntelLabs/teams/team-hps What can we do to make this easiest for users? Could we have build.jl detect if the platform is Mac, install them via Homebrew.jl, and set the necessary env vars? @leonardt https://github.com/leonardt @ehsantn https://github.com/ehsantn Thoughts?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/IntelLabs/ParallelAccelerator.jl/issues/61#issuecomment-222567667, or mute the thread https://github.com/notifications/unsubscribe/AOSumcMDrJShEycPhEP4eZdoLx4LWR3tks5qG3tKgaJpZM4HoMmj .

lkuper commented 8 years ago

@ulneva Can you post a log of exactly what happens when you run the blackscholes example from our repo, including the SELFPRIMED and SELFTIMED numbers and any warnings that you see?

ulneva commented 8 years ago

Here are the logs. Now I actually do see a slight difference in performance. Thank you!

_julia> _using ParallelAccelerator

julia>

julia> *@acc f(x) = x.+x*

f (generic function with 1 method)

_julia> _function cndf2(in::Array{Float64,1})

   *    out = 0.5 .+ 0.5 .* erf(0.707106781 .* in)*

   *    return out*

   *end*

cndf2 (generic function with 1 method)

_julia> _function blackscholes(sptprice::Array{Float64,1},

   *                      strike::Array{Float64,1},*

   *                      rate::Array{Float64,1},*

   *                      volatility::Array{Float64,1},*

   *                      time::Array{Float64,1})*

   *    logterm = log10(sptprice ./ strike)*

   *    powterm = .5 .* volatility .* volatility*

   *    den = volatility .* sqrt(time)*

   *    d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den*

   *    d2 = d1 .- den*

   *    NofXd1 = cndf2(d1)*

   *    NofXd2 = cndf2(d2)*

   *    futureValue = strike .* exp(- rate .* time)*

   *    c1 = futureValue .* NofXd2*

   *    call = sptprice .* NofXd1 .- c1*

   *    put  = call .- futureValue .+ sptprice*

   *end*

blackscholes (generic function with 1 method)

_julia> _function run(iterations)

   *    sptprice   = Float64[ 42.0 for i = 1:iterations ]*

   *    initStrike = Float64[ 40.0 + (i / iterations) for i =

1:iterations ]*

   *    rate       = Float64[ 0.5 for i = 1:iterations ]*

   *    volatility = Float64[ 0.2 for i = 1:iterations ]*

   *    time       = Float64[ 0.5 for i = 1:iterations ]*

   *    tic()*

   *    put = blackscholes(sptprice, initStrike, rate, volatility,

time)*

   *    t = toq()*

   *    println("checksum: ", sum(put))*

   *    return t*

   *end*

run (generic function with 1 method)

julia> *@time run(40_000_000)*

checksum: 8.381928525856283e8

17.829420 seconds (162.04 k allocations: 9.842 GB, 6.44% gc time)

16.736084271

julia> *@acc begin*

   *function cndf2(in::Array{Float64,1})*

   *    out = 0.5 .+ 0.5 .* erf(0.707106781 .* in)*

   *    return out*

   *end*

   *function blackscholes(sptprice::Array{Float64,1},*

   *                      strike::Array{Float64,1},*

   *                      rate::Array{Float64,1},*

   *                      volatility::Array{Float64,1},*

   *                      time::Array{Float64,1})*

   *    logterm = log10(sptprice ./ strike)*

   *    powterm = .5 .* volatility .* volatility*

   *    den = volatility .* sqrt(time)*

   *    d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den*

   *    d2 = d1 .- den*

   *    NofXd1 = cndf2(d1)*

   *    NofXd2 = cndf2(d2)*

   *    futureValue = strike .* exp(- rate .* time)*

   *    c1 = futureValue .* NofXd2*

   *    call = sptprice .* NofXd1 .- c1*

   *    put  = call .- futureValue .+ sptprice*

   *end*

   *end*

blackscholes (generic function with 2 methods)

julia> *@time run(40_000_000)*

checksum: 8.381928525856283e8

12.835870 seconds (236 allocations: 9.835 GB, 11.96% gc time)

11.252472269

Regards, Yulia

On Tue, May 31, 2016 at 4:13 PM, Lindsey Kuper notifications@github.com wrote:

@ulneva https://github.com/ulneva Can you post a log of exactly what happens when you run the blackscholes example from our repo, including the SELFPRIMED and SELFTIMED numbers and any warnings that you see?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IntelLabs/ParallelAccelerator.jl/issues/61#issuecomment-222822565, or mute the thread https://github.com/notifications/unsubscribe/AOSumVxhhPG6yBgDjU2OErQauC_B_wD8ks5qHKRsgaJpZM4HoMmj .

Ken-B commented 8 years ago

I get:

julia> import Base.run

julia> include(Pkg.dir("ParallelAccelerator","examples","black-scholes","black-scholes.jl"))
iterations = 10000000
OpenMP is not used.
SELFPRIMED 1.415843548
checksum: 2.0954821257116845e8
rate = 9.753668231481798e6 opts/sec
SELFTIMED 1.025255295

I keep wondering why ParallelAccelerator doesn't use openblas as it's shipped with Julia, but that's because I don't know enough of this package. Although you probably have good reasons, maybe have an option to just use Julia's openblas? Let me know what I can do to assist. Thanks again.

ehsantn commented 8 years ago

We currently generate C code, which cannot use Julia's libraries. The right solution is probably using CMake to find or install additional libraries (similar to packages like HDF5). OpenBLAS is not needed for most codes though.

Clang compiler on Mac OS doesn't support OpenMP. Again, we need to detect or install GCC with OpenMP and use it. Your contribution for these installation issues is highly appreciated.

lkuper commented 8 years ago

@ulneva Can you post the result of running include(Pkg.dir("ParallelAccelerator","examples","black-scholes","black-scholes.jl")) as @Ken-B did? I'm looking for the SELFTIMED and SELFPRIMED numbers specifically. If you're getting a time like 11 or 12 seconds, that's probably including the ParallelAccelerator package load time.

ulneva commented 8 years ago

Sorry for being absent for a while. When I try to run the include(Pkg.dir("ParallelAccelerator","examples","black-scholes","black-scholes.jl")) line I get: ERROR: LoadError: ArgumentError: DocOpt not found in path in require at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib in include at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib in include_from_node1 at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib while loading /Users/nevskaya/.julia/v0.4/ParallelAccelerator/examples/black-scholes/black-scholes.jl, in expression starting on line 27

DrTodd13 commented 8 years ago

Pkg.add("DocOpt") should fix it. @lkuper can confirm but I think we made the decision to only list packages in REQUIRE that were necessary for the package to operate but not automatically install packages that may only be needed by programs in the examples directory.

lkuper commented 8 years ago

Yes, that's right. Packages that the examples depend on are now listed in test/REQUIRE (but not in REQUIRE).

CorySimon commented 7 years ago

I am on Mac OS and experience the same problems. I brew install gcc --without-multilib, brew install homebrew/science/openblas. Upon Pkg.build("ParallelAccelerator"):

ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
No BLAS installation detected (optional)
Using g++ to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.

and I get a message OpenMP is not used for a simple example using @acc.

echo "$DYLD_LIBRARY_PATH" yields nothing.

[Julia Version 0.5.1] [Pkg.checkout("ParallelAccelerator") bails due to dirty package?]

ehsantn commented 7 years ago

We turn OpenMP off on for Macs with GCC here, since GCC doesn't generally support OpenMP for Macs. You can install ICC if you want to use OpenMP.

pnvolkmar commented 7 years ago

How could we get OpenMP to stay on for mac for those of us who don't want to purchase ICC (it's only available with a 30-day trial)?

ehsantn commented 7 years ago

You can manually set USE_OMP = 1 here. This might fail if the backend C++ compiler doesn't support OpenMP. A useful feature is to make the OpenMP check automatic for different Mac compilers.

You might qualify for free licenses for Intel compiler tools here: https://software.intel.com/en-us/qualify-for-free-software

pmpeter1 commented 7 years ago

Have you looked at: https://software.intel.com/en-us/qualify-for-free-software

This augments the trial license for qualified users to have a free full license.

-Paul

From: pnvolkmar [mailto:notifications@github.com] Sent: Thursday, March 23, 2017 1:33 PM To: IntelLabs/ParallelAccelerator.jl ParallelAccelerator.jl@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [IntelLabs/ParallelAccelerator.jl] Accessing OpenMP, MKL, and OpenBLAS (#61)

How could we get OpenMP to stay on for mac for those of us who don't want to purchase ICC (it's only available with a 30-day trial)?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/IntelLabs/ParallelAccelerator.jl/issues/61#issuecomment-288819192, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOU0YvwCAMhiPBMJuU6cv7CevSdoudJOks5rorrYgaJpZM4HoMmj.

CorySimon commented 7 years ago

Getting a compiler on Mac that supports OpenMP

brew install gcc --without-multilib --> gcc-6 is installed which supports OpenMP.

To test, make toy C code, dude.c:

#include <omp.h>
#include <stdio.h>
int main() {
    #pragma omp parallel
     printf("Hi from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}

Compiling: gcc-6 -fopenmp dude.c -o dude, then running ./dude:

Hello from thread 0, nthreads 4
Hello from thread 1, nthreads 4
Hello from thread 2, nthreads 4
Hello from thread 3, nthreads 4

confirms that OpenMP is supported.

Configure ParallelAccelerator.jl to use this compiler

Change this line to global USE_OMP = 1.

I am now stuck. I changed this line to CC=gcc-6 and Pkg.build("ParallelAccelerator") fails.

To mimic g++, I tried CC="gcc-6 -xc++ -lstdc++ -shared-libgcc", then the build worked. (Motivated by this discussion about difference between g++ and gcc.) Then when I try to use ParallelAccelerator, I get OptFramework failed to optimize function because it is still using Clang for some reason. How can I get it to use gcc-6?

Thank you.

ehsantn commented 7 years ago

I think you need to change this line to use your new compile command.

Gnimuc commented 7 years ago

@CorySimon I think you could simply change g++ to g++-6. I got an OptFramework failed to optimize function error when using gcc-6 -xc++ -lstdc++ -shared-libgcc.

lkuper commented 7 years ago

I've made some recent changes (see #146) that should improve the OpenBLAS and OpenMP situation for ParallelAccelerator users on Mac. I'm going to close this issue; if people continue to have problems, let's discuss in #146 (or file a new issue if that one doesn't seem relevant). Thanks!