Closed albi3ro closed 7 years ago
ParallelAccelerator looks for OPENBLAS_LIB
at build time to find the OpenBLAS library path. Alternatively, you can set it manually in deps/config.jl
.
Unfortunately, the default Clang/GCC installation on OS X doesn't provide OpenMP. You install full GCC manually, and set environment variable CGEN_NO_OMP=0
.
The ParallelAccelerator method for discovering blas installations is very simple (scanning LD_LIBRARY_PATH) and will miss system installed packages (i.e. with apt-get install libopenblas-dev
).
We could use something like Cmake or perhaps parsing the output of a command like ldconfig -p | grep blas
) to make it more robust.
Commit https://github.com/IntelLabs/ParallelAccelerator.jl/commit/d72eed5aa90dbfcf25af70cd980bf391743f9ff9 addresses an issue where DYLD_LIBRARY_PATH
wasn't being seen on OS X 10.11 specifically. @albi3ro, can you run Pkg.checkout("ParallelAccelerator")
, then Pkg.build("ParallelAccelerator")
and see if OpenBLAS is found this time?
If it doesn't work, can you post the output of echo $DYLD_LIBRARY_PATH
? You may have to do something like export DYLD_LIBRARY_PATH=/opt/OpenBLAS/lib
if you're not already doing so.
CGen now looks for system installed BLAS (i.e. -lblas
works). On Ubuntu, users can install OpenBLAS using sudo apt-get install libopenblas-dev libblas-dev
. If OpenBLAS is built manually, the C++ compiler should be able to find both the header file and the library file.
@albi3ro Did the previous suggestion help? If not, feel free to reopen this issue.
I'm on OSX 10.11. I successfully installed OpenMP with brew install gcc --without-multilib
and OpenBLAS with brew install homebrew/science/openblas
.
However, I can't get Pkg.build
(on latest master) to find either OpenMP nor OpenBLAS. I also tried adding the different paths above, but still no success. (You probably have good reasons for not using Julia's OpenBLAS I guess).
Could you help me out? Let me know what I can do to better identify the issue. Thanks!
@Ken-B What does echo $DYLD_LIBRARY_PATH
say? Also, make sure you have the most recent version of the package by running Pkg.checkout("ParallelAccelerator")
.
I am also on El Capitan and just installed openBLAS and openMP via Homebrew. I used Pkg.checkout("ParallelAccelerator") to install and then build the package, but the package test says it cannot find openBLAS and openMP is not used. The Black-Scholes example in this situation does not show any improvement when used with @acc. The echo $DYLD_LIBRARY_PATH gives back an empty line, just white space. Can you please help?
After checking out latest master of ParallelAccelerator I get:
shell> echo $DYLD_LIBRARY_PATH
ERROR: UndefVarError: DYLD_LIBRARY_PATH not defined
@lkuper Could you reopen the issue? Thank you for your effort on looking into this!
It looks like a lot of people are having problems getting ParallelAccelerator to detect OpenBLAS and OpenMP on Mac. @IntelLabs/team-hps What can we do to make this easiest for users? Could we have build.jl detect if the platform is Mac, install them via Homebrew.jl, and set the necessary env vars? @leonardt @ehsantn Thoughts?
Thanks for reopening the issue. Would you recommend something that I can do right now to make ParallelAccelerator work, before the changes are made to the package? I'd really appreciate it since it might allow me to move my project earlier!
Regards, Yulia
On Mon, May 30, 2016 at 7:05 PM, Lindsey Kuper notifications@github.com wrote:
It looks like a lot of people are having problems getting ParallelAccelerator to detect OpenBLAS and OpenMP on Mac. @IntelLabs/team-hps https://github.com/orgs/IntelLabs/teams/team-hps What can we do to make this easiest for users? Could we have build.jl detect if the platform is Mac, install them via Homebrew.jl, and set the necessary env vars? @leonardt https://github.com/leonardt @ehsantn https://github.com/ehsantn Thoughts?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/IntelLabs/ParallelAccelerator.jl/issues/61#issuecomment-222567667, or mute the thread https://github.com/notifications/unsubscribe/AOSumcMDrJShEycPhEP4eZdoLx4LWR3tks5qG3tKgaJpZM4HoMmj .
@ulneva Can you post a log of exactly what happens when you run the blackscholes example from our repo, including the SELFPRIMED and SELFTIMED numbers and any warnings that you see?
Here are the logs. Now I actually do see a slight difference in performance. Thank you!
_julia> _using ParallelAccelerator
julia>
julia> *@acc f(x) = x.+x*
f (generic function with 1 method)
_julia> _function cndf2(in::Array{Float64,1})
* out = 0.5 .+ 0.5 .* erf(0.707106781 .* in)*
* return out*
*end*
cndf2 (generic function with 1 method)
_julia> _function blackscholes(sptprice::Array{Float64,1},
* strike::Array{Float64,1},*
* rate::Array{Float64,1},*
* volatility::Array{Float64,1},*
* time::Array{Float64,1})*
* logterm = log10(sptprice ./ strike)*
* powterm = .5 .* volatility .* volatility*
* den = volatility .* sqrt(time)*
* d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den*
* d2 = d1 .- den*
* NofXd1 = cndf2(d1)*
* NofXd2 = cndf2(d2)*
* futureValue = strike .* exp(- rate .* time)*
* c1 = futureValue .* NofXd2*
* call = sptprice .* NofXd1 .- c1*
* put = call .- futureValue .+ sptprice*
*end*
blackscholes (generic function with 1 method)
_julia> _function run(iterations)
* sptprice = Float64[ 42.0 for i = 1:iterations ]*
* initStrike = Float64[ 40.0 + (i / iterations) for i =
1:iterations ]*
* rate = Float64[ 0.5 for i = 1:iterations ]*
* volatility = Float64[ 0.2 for i = 1:iterations ]*
* time = Float64[ 0.5 for i = 1:iterations ]*
* tic()*
* put = blackscholes(sptprice, initStrike, rate, volatility,
time)*
* t = toq()*
* println("checksum: ", sum(put))*
* return t*
*end*
run (generic function with 1 method)
julia> *@time run(40_000_000)*
checksum: 8.381928525856283e8
17.829420 seconds (162.04 k allocations: 9.842 GB, 6.44% gc time)
16.736084271
julia> *@acc begin*
*function cndf2(in::Array{Float64,1})*
* out = 0.5 .+ 0.5 .* erf(0.707106781 .* in)*
* return out*
*end*
*function blackscholes(sptprice::Array{Float64,1},*
* strike::Array{Float64,1},*
* rate::Array{Float64,1},*
* volatility::Array{Float64,1},*
* time::Array{Float64,1})*
* logterm = log10(sptprice ./ strike)*
* powterm = .5 .* volatility .* volatility*
* den = volatility .* sqrt(time)*
* d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den*
* d2 = d1 .- den*
* NofXd1 = cndf2(d1)*
* NofXd2 = cndf2(d2)*
* futureValue = strike .* exp(- rate .* time)*
* c1 = futureValue .* NofXd2*
* call = sptprice .* NofXd1 .- c1*
* put = call .- futureValue .+ sptprice*
*end*
*end*
blackscholes (generic function with 2 methods)
julia> *@time run(40_000_000)*
checksum: 8.381928525856283e8
12.835870 seconds (236 allocations: 9.835 GB, 11.96% gc time)
11.252472269
Regards, Yulia
On Tue, May 31, 2016 at 4:13 PM, Lindsey Kuper notifications@github.com wrote:
@ulneva https://github.com/ulneva Can you post a log of exactly what happens when you run the blackscholes example from our repo, including the SELFPRIMED and SELFTIMED numbers and any warnings that you see?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IntelLabs/ParallelAccelerator.jl/issues/61#issuecomment-222822565, or mute the thread https://github.com/notifications/unsubscribe/AOSumVxhhPG6yBgDjU2OErQauC_B_wD8ks5qHKRsgaJpZM4HoMmj .
I get:
julia> import Base.run
julia> include(Pkg.dir("ParallelAccelerator","examples","black-scholes","black-scholes.jl"))
iterations = 10000000
OpenMP is not used.
SELFPRIMED 1.415843548
checksum: 2.0954821257116845e8
rate = 9.753668231481798e6 opts/sec
SELFTIMED 1.025255295
I keep wondering why ParallelAccelerator doesn't use openblas as it's shipped with Julia, but that's because I don't know enough of this package. Although you probably have good reasons, maybe have an option to just use Julia's openblas? Let me know what I can do to assist. Thanks again.
We currently generate C code, which cannot use Julia's libraries. The right solution is probably using CMake to find or install additional libraries (similar to packages like HDF5). OpenBLAS is not needed for most codes though.
Clang compiler on Mac OS doesn't support OpenMP. Again, we need to detect or install GCC with OpenMP and use it. Your contribution for these installation issues is highly appreciated.
@ulneva Can you post the result of running include(Pkg.dir("ParallelAccelerator","examples","black-scholes","black-scholes.jl"))
as @Ken-B did? I'm looking for the SELFTIMED and SELFPRIMED numbers specifically. If you're getting a time like 11 or 12 seconds, that's probably including the ParallelAccelerator package load time.
Sorry for being absent for a while. When I try to run the include(Pkg.dir("ParallelAccelerator","examples","black-scholes","black-scholes.jl")) line I get: ERROR: LoadError: ArgumentError: DocOpt not found in path in require at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib in include at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib in include_from_node1 at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib while loading /Users/nevskaya/.julia/v0.4/ParallelAccelerator/examples/black-scholes/black-scholes.jl, in expression starting on line 27
Pkg.add("DocOpt") should fix it. @lkuper can confirm but I think we made the decision to only list packages in REQUIRE that were necessary for the package to operate but not automatically install packages that may only be needed by programs in the examples directory.
Yes, that's right. Packages that the examples depend on are now listed in test/REQUIRE (but not in REQUIRE).
I am on Mac OS and experience the same problems. I brew install gcc --without-multilib
, brew install homebrew/science/openblas
. Upon Pkg.build("ParallelAccelerator")
:
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
No BLAS installation detected (optional)
Using g++ to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.
and I get a message OpenMP is not used
for a simple example using @acc
.
echo "$DYLD_LIBRARY_PATH"
yields nothing.
[Julia Version 0.5.1]
[Pkg.checkout("ParallelAccelerator")
bails due to dirty package?]
We turn OpenMP off on for Macs with GCC here, since GCC doesn't generally support OpenMP for Macs. You can install ICC if you want to use OpenMP.
How could we get OpenMP to stay on for mac for those of us who don't want to purchase ICC (it's only available with a 30-day trial)?
You can manually set USE_OMP = 1
here. This might fail if the backend C++ compiler doesn't support OpenMP. A useful feature is to make the OpenMP check automatic for different Mac compilers.
You might qualify for free licenses for Intel compiler tools here: https://software.intel.com/en-us/qualify-for-free-software
Have you looked at: https://software.intel.com/en-us/qualify-for-free-software
This augments the trial license for qualified users to have a free full license.
-Paul
From: pnvolkmar [mailto:notifications@github.com] Sent: Thursday, March 23, 2017 1:33 PM To: IntelLabs/ParallelAccelerator.jl ParallelAccelerator.jl@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [IntelLabs/ParallelAccelerator.jl] Accessing OpenMP, MKL, and OpenBLAS (#61)
How could we get OpenMP to stay on for mac for those of us who don't want to purchase ICC (it's only available with a 30-day trial)?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/IntelLabs/ParallelAccelerator.jl/issues/61#issuecomment-288819192, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOU0YvwCAMhiPBMJuU6cv7CevSdoudJOks5rorrYgaJpZM4HoMmj.
brew install gcc --without-multilib
--> gcc-6
is installed which supports OpenMP.
To test, make toy C code, dude.c
:
#include <omp.h>
#include <stdio.h>
int main() {
#pragma omp parallel
printf("Hi from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
Compiling: gcc-6 -fopenmp dude.c -o dude
, then running ./dude
:
Hello from thread 0, nthreads 4
Hello from thread 1, nthreads 4
Hello from thread 2, nthreads 4
Hello from thread 3, nthreads 4
confirms that OpenMP is supported.
ParallelAccelerator.jl
to use this compilerChange this line to global USE_OMP = 1
.
I am now stuck. I changed this line to CC=gcc-6
and Pkg.build("ParallelAccelerator")
fails.
To mimic g++, I tried CC="gcc-6 -xc++ -lstdc++ -shared-libgcc"
, then the build worked. (Motivated by this discussion about difference between g++ and gcc.)
Then when I try to use ParallelAccelerator, I get OptFramework failed to optimize function
because it is still using Clang for some reason. How can I get it to use gcc-6?
Thank you.
@CorySimon I think you could simply change g++
to g++-6
. I got an OptFramework failed to optimize function
error when using gcc-6 -xc++ -lstdc++ -shared-libgcc
.
I've made some recent changes (see #146) that should improve the OpenBLAS and OpenMP situation for ParallelAccelerator users on Mac. I'm going to close this issue; if people continue to have problems, let's discuss in #146 (or file a new issue if that one doesn't seem relevant). Thanks!
When running the tests, I get
OpenMP is not used.
andWarning: MKL and OpenBLAS not found.
I'm running on Mac OSX El Capitan. I have gcc, and I just installed OpenBLAS, and the warnings are still there about OpenBLAS.
How can I make sure that those aspects are being used?