IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

Parallel Accelerator not picking up BLAS? (Ubuntu 16.04) #126

Closed JMurph2015 closed 7 years ago

JMurph2015 commented 7 years ago

I'm not sure what it's issue is, but it seems to stem from a cblas file, so I would assume it is something to do with the BLAS finding problem. Here's some dumps: Pkg.test("ParallelAccelerator")

INFO: Computing test dependencies for ParallelAccelerator...
INFO: Installing ColorVectorSpace v0.1.11
INFO: Installing Dates v0.4.4
INFO: Installing DocOpt v0.2.0
INFO: Installing ImageMagick v0.1.8
INFO: Installing Images v0.5.14
INFO: Installing IniFile v0.2.5
INFO: Installing SIUnits v0.1.0
INFO: Installing StatsBase v0.11.1
INFO: Installing TexExtensions v0.0.3
INFO: Installing Tk v0.4.0
INFO: Installing Winston v0.12.1
INFO: Installing Zlib v0.1.12
INFO: Building ImageMagick
INFO: Building Cairo
INFO: Building Tk
INFO: Testing ParallelAccelerator
Testing parallel library functions...
WARNING: scale(A::AbstractMatrix,x::AbstractVector) is deprecated, use A * Diagonal(x) instead.
 in depwarn(::String, ::Symbol) at ./deprecated.jl:64
 in scale(::Array{Float64,2}, ::Array{Float64,1}) at ./deprecated.jl:50
 in ##normalizeW#282(::Array{Float64,2}, ::Int64) at /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl:63
 in test7() at /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl:111
 in include_from_node1(::String) at ./loading.jl:488 (repeats 2 times)
 in process_options(::Base.JLOptions) at ./client.jl:262
 in _start() at ./client.jl:318
while loading /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl, in expression starting on line 130
Done testing parallel library functions.
Testing parfor support via @par macro...
Done testing parfor.
Testing map and reduce...
Done testing map and reduce.
Testing abs()...
Done testing abs().
Testing constant promotion for pointwise operations...
Done testing constant promotion.
Testing rand()...
Done testing rand()...
Testing BitArrays...
Done testing BitArrays.
Testing ranges...
Done testing ranges.
Testing sequential code...
Done testing sequential code.
Testing cat...
Done testing cat.
Testing ranges...
Done testing ranges.
Testing miscellaneous features...
Done testing miscellaneous features...
Done testing aug_assign.
Testing complex number support...
 test1 returns: 3.0 + 3.0im
Done testing complex number support.
Testing println()...
Done testing println().
Testing strings...
/tmp/tmpBrYNuN/cgen_output64.cpp:18:254: warning: integer constant is so large that it is unsigned
 static uint64_t _Base_powers_of_ten_[20] = {1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000,10000000000,100000000000,1000000000000,10000000000000,100000000000000,1000000000000000,10000000000000000,100000000000000000,1000000000000000000,10000000000000000000};
                                                                                                                                                                                                                                                              ^
Done testing strings...
Testing logistic regression...
Done testing logistic regression...
Testing kmeans...
OptFramework failed to optimize function TestKmeans.##kmeans#1113 in optimization pass ParallelAccelerator.Driver.toCGen with error Could determine type for arg 2 to call .Base.isa with name Base.SizeUnknown
Done testing kmeans...
testing gemv...
/tmp/tmpBrYNuN/cgen_output67.cpp: In function ‘void ppgemv_t2p1148(j2c_array<double>&, j2c_array<double>&, j2c_array<double>*)’:
/tmp/tmpBrYNuN/cgen_output67.cpp:28:68: error: ‘cblas_domatcopy’ was not declared in this scope
              A.data, A.ARRAYSIZE(1), SSAValue0.data, A.ARRAYSIZE(2)), SSAValue0);
                                                                    ^
/tmp/tmpBrYNuN/cgen_output67.cpp: In function ‘void ppgemv_t2p1148_unaliased(j2c_array<double>&, j2c_array<double>&, j2c_array<double>*)’:
/tmp/tmpBrYNuN/cgen_output67.cpp:46:68: error: ‘cblas_domatcopy’ was not declared in this scope
              A.data, A.ARRAYSIZE(1), SSAValue0.data, A.ARRAYSIZE(2)), SSAValue0);
                                                                    ^
OptFramework failed to optimize function TestGemv.##gemv_t2#1148 in optimization pass ParallelAccelerator.Driver.toCGen with error ErrorException("failed process: Process(`g++ -O3 -fopenmp -std=c++11 -g -fpic -c -o /tmp/tmpBrYNuN/cgen_output67.o /tmp/tmpBrYNuN/cgen_output67.cpp`, ProcessExited(1)) [1]")
Done testing gemv.
testing transpose...
/tmp/tmpBrYNuN/cgen_output68.cpp: In function ‘void pptranspose_tp1160(j2c_array<double>&, j2c_array<double>*)’:
/tmp/tmpBrYNuN/cgen_output68.cpp:25:68: error: ‘cblas_domatcopy’ was not declared in this scope
              A.data, A.ARRAYSIZE(1), SSAValue0.data, A.ARRAYSIZE(2)), SSAValue0);
                                                                    ^
/tmp/tmpBrYNuN/cgen_output68.cpp: In function ‘void pptranspose_tp1160_unaliased(j2c_array<double>&, j2c_array<double>*)’:
/tmp/tmpBrYNuN/cgen_output68.cpp:35:68: error: ‘cblas_domatcopy’ was not declared in this scope
              A.data, A.ARRAYSIZE(1), SSAValue0.data, A.ARRAYSIZE(2)), SSAValue0);
                                                                    ^
OptFramework failed to optimize function TestTranspose.##transpose_t#1160 in optimization pass ParallelAccelerator.Driver.toCGen with error ErrorException("failed process: Process(`g++ -O3 -fopenmp -std=c++11 -g -fpic -c -o /tmp/tmpBrYNuN/cgen_output68.o /tmp/tmpBrYNuN/cgen_output68.cpp`, ProcessExited(1)) [1]")
Done testing transpose.
testing vecnorm...
Done testing vecnorm.
Testing broadcast...
test3 C = 
[1 2 3; 8 10 12]
Done testing broadcast.
iterations = 10000000
SELFPRIMED 1.976333224
checksum: 2.0954821257116845e8
rate = 2.5385804121759485e7 opts/sec
SELFTIMED 0.393920947
points= 10000000
SELFPRIMED 1.189373116
pi = 3.1412388
SELFTIMED 0.14997435
nframes = 2
filenames = 
String["small_001.dat","small_002.dat"]
checksums = 
Float32
[80751.4,80818.0]
Image size: 584x388
SELFPRIMED 8.895185334
checksum: -286095.5 -210069.86
SELFTIMED 13.745833804
iterations = 30
centers= 5
number of points = 50000
SELFPRIMED 2.251612612
result = 
[0.276528 0.353684 0.682667 0.623608 0.560988; 0.503455 0.47413 0.511227 0.48242 0.53094; 0.526752 0.512084 0.515092 0.462968 0.47521; 0.513815 0.563066 0.459114 0.522883 0.447855; 0.492263 0.523317 0.494208 0.516432 0.474825; 0.452301 0.529621 0.511337 0.549255 0.477511; 0.51081 0.494879 0.524365 0.439498 0.523772; 0.640843 0.41478 0.496349 0.28888 0.657364; 0.591959 0.253946 0.294331 0.728377 0.636177; 0.522098 0.511876 0.448276 0.493945 0.516784; 0.5239 0.461113 0.53245 0.549088 0.435481; 0.447019 0.532317 0.560837 0.591498 0.37088; 0.428364 0.404779 0.628586 0.398629 0.637814; 0.509621 0.560057 0.47417 0.483586 0.476626; 0.500642 0.501699 0.491443 0.511089 0.493128; 0.733622 0.306472 0.699768 0.47905 0.263976; 0.470839 0.472082 0.541894 0.486239 0.528017; 0.548557 0.462906 0.448819 0.592823 0.458249; 0.460154 0.537286 0.498311 0.517852 0.483539; 0.483073 0.546975 0.498439 0.493434 0.485849]
rate = 30.153927204509767 iterations/sec
SELFTIMED 0.994895285
iterations = 50
SELFPRIMED 1.877318882
result = [1079.97 1087.71 1138.98 1098.88 1107.83 1071.37 1054.47 1072.7 1160.0 1151.16]
rate = 3078.4736804353456 iterations/sec
SELFTIMED 0.016241815
in -3
Hello world!
INFO: ParallelAccelerator tests passed
INFO: Removing ColorVectorSpace v0.1.11
INFO: Removing Dates v0.4.4
INFO: Removing DocOpt v0.2.0
INFO: Removing ImageMagick v0.1.8
INFO: Removing Images v0.5.14
INFO: Removing IniFile v0.2.5
INFO: Removing SIUnits v0.1.0
INFO: Removing StatsBase v0.11.1
INFO: Removing TexExtensions v0.0.3
INFO: Removing Tk v0.4.0
INFO: Removing Winston v0.12.1
INFO: Removing Zlib v0.1.12

config.jl

backend_compiler = USE_GCC
mkl_lib = ""
openblas_lib = ""
sys_blas = 1

I actually have both MKL and OpenBLAS installed, at least MKL (intel/mkl/lib/intel64?) is in my ld search path. Anyone got ideas here? It pretty much cripples the package (prevents even basic functions from working), so I'd really like to figure it out. (I can provide more outputs if necessary) Thanks!

lkuper commented 7 years ago

@JMurph2015 Hmm. What do you see when you run Pkg.build("ParallelAccelerator")?

I also want to know why ParallelAccelerator isn't picking up your MKL. What does echo $LD_LIBRARY_PATH show?

lkuper commented 7 years ago

BTW, the error with transpose_t was also reported in issue #111, but I've never seen the kmeans or gemv_t2 issues before.

ehsantn commented 7 years ago

ParallelAccelerator isn't picking up MKL since it assumes MKL is used with ICC and not GCC. We could add support for GCC+MKL. Do you have ICC installed?

$LD_LIBRARY_PATH is irrelevant since it complains about compilation not linking.

I think the problem is that the system installed BLAS doesn't include BLAS extensions which include cblas_domatcopy. We should have a configuration check for BLAS extensions and generate code accordingly.

I noticed another issue here: transpose is not fused with gemv call in the second test. This can hurt performance in ParallelAccelerator and break HPAT.

JMurph2015 commented 7 years ago

So I don't have ICC installed (as far as I can tell it's only available as an expensive license, but maybe I'm wrong?). So currently I had GCC and MKL (but also Atlas). I'm not sure why it doesn't like cblas_dotmatcopy. Here's what happens when I build it.

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> Pkg.build("ParallelAccelerator")
INFO: Building ParallelAccelerator
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
System installed BLAS found
Using g++ to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.
JMurph2015 commented 7 years ago

I went ahead and built OpenBLAS from source and installed to /usr/local, then hand edited the config.jl to reflect that new position, which quieted things a lot.

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> Pkg.test("ParallelAccelerator")
INFO: Computing test dependencies for ParallelAccelerator...
INFO: Installing ColorVectorSpace v0.1.11
INFO: Installing Dates v0.4.4
INFO: Installing DocOpt v0.2.0
INFO: Installing ImageMagick v0.1.8
INFO: Installing Images v0.5.14
INFO: Installing IniFile v0.2.5
INFO: Installing SIUnits v0.1.0
INFO: Installing StatsBase v0.11.1
INFO: Installing TexExtensions v0.0.3
INFO: Installing Tk v0.4.0
INFO: Installing Winston v0.12.1
INFO: Installing Zlib v0.1.12
INFO: Building ImageMagick
INFO: Building Cairo
INFO: Building Tk
INFO: Testing ParallelAccelerator
Testing parallel library functions...
WARNING: scale(A::AbstractMatrix,x::AbstractVector) is deprecated, use A * Diagonal(x) instead.
 in depwarn(::String, ::Symbol) at ./deprecated.jl:64
 in scale(::Array{Float64,2}, ::Array{Float64,1}) at ./deprecated.jl:50
 in ##normalizeW#282(::Array{Float64,2}, ::Int64) at /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl:63
 in test7() at /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl:111
 in include_from_node1(::String) at ./loading.jl:488 (repeats 2 times)
 in process_options(::Base.JLOptions) at ./client.jl:262
 in _start() at ./client.jl:318
while loading /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl, in expression starting on line 130
Done testing parallel library functions.
Testing parfor support via @par macro...
Done testing parfor.
Testing map and reduce...
Done testing map and reduce.
Testing abs()...
Done testing abs().
Testing constant promotion for pointwise operations...
Done testing constant promotion.
Testing rand()...
Done testing rand()...
Testing BitArrays...
Done testing BitArrays.
Testing ranges...
Done testing ranges.
Testing sequential code...
Done testing sequential code.
Testing cat...
Done testing cat.
Testing ranges...
Done testing ranges.
Testing miscellaneous features...
Done testing miscellaneous features...
Done testing aug_assign.
Testing complex number support...
 test1 returns: 3.0 + 3.0im
Done testing complex number support.
Testing println()...
in -3
Done testing println().
Testing strings...
Hello world!
/tmp/tmpdWI9F7/cgen_output64.cpp:18:254: warning: integer constant is so large that it is unsigned
 0,1000000000000,10000000000000,100000000000000,1000000000000000,10000000000000000,100000000000000000,1000000000000000000,10000000000
                                                                                                                          ^
Done testing strings...
Testing logistic regression...
Done testing logistic regression...
Testing kmeans...
OptFramework failed to optimize function TestKmeans.##kmeans#1113 in optimization pass ParallelAccelerator.Driver.toCGen with error Could determine type for arg 2 to call .Base.isa with name Base.SizeUnknown
Done testing kmeans...
testing gemv...
Done testing gemv.
testing transpose...
Done testing transpose.
testing vecnorm...
Done testing vecnorm.
Testing broadcast...
test3 C = [1 2 3; 8 10 12]
Done testing broadcast.
iterations = 10000000
SELFPRIMED 2.049170615
checksum: 2.0954821257116845e8
rate = 2.62439837930698e7 opts/sec
SELFTIMED 0.381039711
points= 10000000
SELFPRIMED 1.253555047
pi = 3.1419876
SELFTIMED 0.151256381
nframes = 2
filenames = String["small_001.dat","small_002.dat"]
checksums = Float32[80751.4,80818.0]
Image size: 584x388
SELFPRIMED 9.274791455
checksum: -286095.5 -210069.86
SELFTIMED 13.233145752
iterations = 30
centers= 5
number of points = 50000
SELFPRIMED 2.379851148
result = [0.425475 0.577452 0.436136 0.482608 0.571198; 0.557001 0.399582 0.539415 0.422502 0.586738; 0.455567 0.652462 0.299701 0.595573 0.485844; 0.563115 0.492276 0.512112 0.476745 0.461907; 0.444491 0.48951 0.60748 0.643702 0.312247; 0.467446 0.579576 0.572745 0.487176 0.412578; 0.449568 0.467789 0.56685 0.53296 0.478751; 0.457571 0.438916 0.60669 0.517769 0.486037; 0.479889 0.507662 0.497689 0.489309 0.520009; 0.530496 0.593008 0.474302 0.444818 0.45079; 0.558323 0.54588 0.485714 0.455248 0.456557; 0.596519 0.408404 0.61486 0.537348 0.341988; 0.545649 0.470891 0.556785 0.405261 0.525715; 0.70319 0.310286 0.285966 0.721125 0.472729; 0.522865 0.422656 0.542794 0.539848 0.46837; 0.620381 0.437393 0.490801 0.401205 0.541853; 0.456 0.564907 0.532804 0.56755 0.376528; 0.547282 0.536663 0.419683 0.524719 0.477716; 0.71925 0.662726 0.469541 0.335353 0.310584; 0.486873 0.544916 0.506794 0.532486 0.435958]
rate = 39.2764879721706 iterations/sec
SELFTIMED 0.763815747
iterations = 50
SELFPRIMED 1.758080073
result = [960.762 928.721 1020.67 965.137 999.146 1028.5 972.938 919.218 930.331 918.58]
rate = 2053.7482342899557 iterations/sec
SELFTIMED 0.02434573
INFO: ParallelAccelerator tests passed
INFO: Removing ColorVectorSpace v0.1.11
INFO: Removing Dates v0.4.4
INFO: Removing DocOpt v0.2.0
INFO: Removing ImageMagick v0.1.8
INFO: Removing Images v0.5.14
INFO: Removing IniFile v0.2.5
INFO: Removing SIUnits v0.1.0
INFO: Removing StatsBase v0.11.1
INFO: Removing TexExtensions v0.0.3
INFO: Removing Tk v0.4.0
INFO: Removing Winston v0.12.1
INFO: Removing Zlib v0.1.12
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> Pkg.build("ParallelAccelerator")
INFO: Building ParallelAccelerator
ParallelAccelerator: build.jl begin.
ParallelAccelerator: Building j2c-array shared library
System installed BLAS found
Using g++ to build ParallelAccelerator array runtime.
ParallelAccelerator: build.jl done.

julia> Pkg.test("ParallelAccelerator")
INFO: Computing test dependencies for ParallelAccelerator...
INFO: Installing ColorVectorSpace v0.1.11
INFO: Installing Dates v0.4.4
INFO: Installing DocOpt v0.2.0
INFO: Installing ImageMagick v0.1.8
INFO: Installing Images v0.5.14
INFO: Installing IniFile v0.2.5
INFO: Installing SIUnits v0.1.0
INFO: Installing StatsBase v0.11.1
INFO: Installing TexExtensions v0.0.3
INFO: Installing Tk v0.4.0
INFO: Installing Winston v0.12.1
INFO: Installing Zlib v0.1.12
INFO: Building ImageMagick
INFO: Building Cairo
INFO: Building Tk
INFO: Testing ParallelAccelerator
Testing parallel library functions...
WARNING: scale(A::AbstractMatrix,x::AbstractVector) is deprecated, use A * Diagonal(x) instead.
 in depwarn(::String, ::Symbol) at ./deprecated.jl:64
 in scale(::Array{Float64,2}, ::Array{Float64,1}) at ./deprecated.jl:50
 in ##normalizeW#282(::Array{Float64,2}, ::Int64) at /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl:63
 in test7() at /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl:111
 in include_from_node1(::String) at ./loading.jl:488 (repeats 2 times)
 in process_options(::Base.JLOptions) at ./client.jl:262
 in _start() at ./client.jl:318
while loading /home/murphyj/.julia/v0.5/ParallelAccelerator/test/lib.jl, in expression starting on line 130
Done testing parallel library functions.
Testing parfor support via @par macro...
Done testing parfor.
Testing map and reduce...
Done testing map and reduce.
Testing abs()...
Done testing abs().
Testing constant promotion for pointwise operations...
Done testing constant promotion.
Testing rand()...
Done testing rand()...
Testing BitArrays...
Done testing BitArrays.
Testing ranges...
Done testing ranges.
Testing sequential code...
Done testing sequential code.
Testing cat...
Done testing cat.
Testing ranges...
Done testing ranges.
Testing miscellaneous features...
Done testing miscellaneous features...
Done testing aug_assign.
Testing complex number support...
 test1 returns: 3.0 + 3.0im
Done testing complex number support.
Testing println()...
in -3
Done testing println().
Testing strings...
Hello world!
/tmp/tmpK1c7iK/cgen_output64.cpp:18:254: warning: integer constant is so large that it is unsigned
 0,1000000000000,10000000000000,100000000000000,1000000000000000,10000000000000000,100000000000000000,1000000000000000000,10000000000
                                                                                                                          ^
Done testing strings...
Testing logistic regression...
Done testing logistic regression...
Testing kmeans...
OptFramework failed to optimize function TestKmeans.##kmeans#1113 in optimization pass ParallelAccelerator.Driver.toCGen with error Could determine type for arg 2 to call .Base.isa with name Base.SizeUnknown
Done testing kmeans...
testing gemv...
/home/murphyj/julia/v0.5/bin/julia: symbol lookup error: /tmp/tmpK1c7iK/libcgen_output67.so.1.0: undefined symbol: cblas_domatcopy
===================================================[ ERROR: ParallelAccelerator ]====================================================

failed process: Process(`/home/murphyj/julia/v0.5/bin/julia -Cx86-64 -J/home/murphyj/julia/v0.5/lib/julia/sys.so --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /home/murphyj/.julia/v0.5/ParallelAccelerator/test/runtests.jl`, ProcessExited(127)) [127]

=====================================================================================================================================
INFO: Removing ColorVectorSpace v0.1.11
INFO: Removing Dates v0.4.4
INFO: Removing DocOpt v0.2.0
INFO: Removing ImageMagick v0.1.8
INFO: Removing Images v0.5.14
INFO: Removing IniFile v0.2.5
INFO: Removing SIUnits v0.1.0
INFO: Removing StatsBase v0.11.1
INFO: Removing TexExtensions v0.0.3
INFO: Removing Tk v0.4.0
INFO: Removing Winston v0.12.1
INFO: Removing Zlib v0.1.12
ERROR: ParallelAccelerator had test errors
 in #test#61(::Bool, ::Function, ::Array{AbstractString,1}) at ./pkg/entry.jl:740
 in (::Base.Pkg.Entry.#kw##test)(::Array{Any,1}, ::Base.Pkg.Entry.#test, ::Array{AbstractString,1}) at ./<missing>:0
 in (::Base.Pkg.Dir.##2#3{Array{Any,1},Base.Pkg.Entry.#test,Tuple{Array{AbstractString,1}}})() at ./pkg/dir.jl:31
 in cd(::Base.Pkg.Dir.##2#3{Array{Any,1},Base.Pkg.Entry.#test,Tuple{Array{AbstractString,1}}}, ::String) at ./file.jl:59
 in #cd#1(::Array{Any,1}, ::Function, ::Function, ::Array{AbstractString,1}, ::Vararg{Array{AbstractString,1},N}) at ./pkg/dir.jl:31
 in (::Base.Pkg.Dir.#kw##cd)(::Array{Any,1}, ::Base.Pkg.Dir.#cd, ::Function, ::Array{AbstractString,1}, ::Vararg{Array{AbstractString,1},N}) at ./<missing>:0
 in #test#3(::Bool, ::Function, ::String, ::Vararg{String,N}) at ./pkg/pkg.jl:258
 in test(::String, ::Vararg{String,N}) at ./pkg/pkg.jl:258

Interestingly, when I rebuilt and the config.jl went back to "default blas configuration" it hit some middle ground. There's a lot of moving parts here so I'll leave it at that.

backend_compiler = USE_GCC
mkl_lib = ""
openblas_lib = ""
sys_blas = 1

This still leaves the problem somewhat open that system installed BLAS should be handled a bit differently if possible (so that we can use package manager versions if possible). I'm not sure what's causing the middle level of error, but that's something to consider since Pkg.build() didn't populate the BLAS lib variable automatically, so the first run was somewhat hacky. Also the kmeans issue is better, but still present, which would imply that it has something else going on (Perhaps Julia v0.5 is causing it).

(Also an GCC+MKL option would be fantastic)

lkuper commented 7 years ago

Update: I filed a separate issue for the domatcopy problem (#147). Also, our build script no longer assumes that you're using ICC and MKL together (this was handled in the fix for #145), so GCC+MKL should be OK now.

So I'm going to go ahead and close this old issue, but @JMurph2015, feel free to reopen (or file a new issue) if you're still having problems that aren't covered in the above issues. Thanks.

JMurph2015 commented 7 years ago

Sorry I haven't been on my public Github for a while. Thanks for the help and thanks for enabling GCC+MKL!