JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.5k stars 5.46k forks source link

32bit OpenBLAS build with 64bit Julia by default on OSX 10.6.8 #3838

Closed cmcbride closed 10 years ago

cmcbride commented 11 years ago

On recent master branch:

[nibbler julia]% cat .git/refs/heads/master c575520aa51331b771af8c3a077cc06013a8505f

Julia builds in 64-bit, but OpenBLAS appears to build in 32-bit:

[nibbler julia]% ./julia ERROR: OpenBLAS was not built with 64bit integer support. You're seeing this error because Julia was built with USE_BLAS64=1 Please rebuild Julia with USE_BLAS64=0 Quitting.

@ViralBShah Thinks this might be due to gcc. For reference: [nibbler julia]% gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

vtjnash commented 11 years ago

I'm not certain whether or not that fixes it, or how much you will need to delete to get into a consistent state (maybe just openblas), but I was able to ascertain that openblas expected the compiler name ($CC) to contain either -m32 or -m64 on OS X so this perhaps gets closer to the goal.

staticfloat commented 11 years ago

OpenBLAS has problems with gcc on OSX. If @vtjnash's solution doesn't fix it, what is your version of clang?

ViralBShah commented 11 years ago

Perhaps we switch to using system blas by default on 10.6 as we used to.

vtjnash commented 11 years ago

sigh, upstream bug opened: https://github.com/xianyi/OpenBLAS/issues/265 so that we can revert cbd9c3d1920aa0881a0559e8327fa7a5d67afea0 once that is closed

cmcbride commented 11 years ago

I think this can be solved by the suggestion of @ViralBShah: revert to using system BLAS in 10.6. It's a minority platform anyhow that will eventually be phased out.

vtjnash commented 11 years ago

Can you post the contents of deps/openblas-v0.2.6/Makefile.conf and deps/openblas-v0.2.6/config.h?

cmcbride commented 11 years ago

OK, I just rechecked (pulled, recompiled without Make.user) and it seemed to work.

julia> versioninfo() Julia Version 0.2.0-prerelease+2873 Commit 3d28781* 2013-07-29 08:33:53 UTC Platform Info: System: Darwin (x86_64-apple-darwin10) WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm

I doubt you want to see the openblas config now, but let me know if you do.

vtjnash commented 11 years ago

oh, i guess not then. it seems it was working all along then?

cmcbride commented 11 years ago

Ugh, sorry for the noise. Seems it was a problem with the environment that I missed ruling out.

[nibbler julia]% LD_LIBRARY_PATH="/opt/local/lib" ./julia ERROR: OpenBLAS was not built with 64bit integer support. You're seeing this error because Julia was built with USE_BLAS64=1 Please rebuild Julia with USE_BLAS64=0 Quitting. [nibbler julia]% LD_LIBRARYPATH="" ./julia () | A fresh approach to technical computing () | () () | Documentation: http://docs.julialang.org | | | Type "help()" to list help topics | | | | | | |/ ` | | | | || | | | (| | | Version 0.2.0-prerelease+2873 / |_'||_|'_| | Commit 3d28781* 2013-07-29 08:33:53 UTC |__/ | x86_64-apple-darwin10

The culprit appears to be something in /opt/local/lib (MacPorts) but there is no OpenBLAS there (I checked for other libraries using locate, find, etc initially). The only BLAS in there that I see is libgslcblas.

I'm trying to find the OSX equivalent of strace / ltrace to find out which library julia is trying to load that causes this problem.

cmcbride commented 11 years ago

OK, my OSX dev-foo is weak. If anyone knows how to find which dynamic libraries are being opened (e.g. strace PROG on linux for opened files), I'd be interested to find out.

In any case, I'm a little confused by the following:

diff --git a/base/util.jl b/base/util.jl
index f66e7a0..010570f 100644
--- a/base/util.jl
+++ b/base/util.jl
@@ -237,6 +237,8 @@ function blas_set_num_threads(n::Integer)
 end

 function check_blas()
+    println(blas_vendor())
+    println(openblas_get_config())
     if blas_vendor() == :openblas
         openblas_config = openblas_get_config()
         openblas64 = ismatch(r".*USE64BITINT.*", openblas_config)

Which then leads to this

% ./julia
openblas
USE64BITINT NO_AFFINITY
ERROR: OpenBLAS was not built with 64bit integer support.
You're seeing this error because Julia was built with USE_BLAS64=1
Please rebuild Julia with USE_BLAS64=0
Quitting.

This seems at odds with the logic of the code above, and I am not sure why this works in one case but not another (it works with MacPorts library path removed). Could there be a problem with some library regexp library that ismatch() is using causing this test to fail?

cmcbride commented 11 years ago

Yup, looks like OpenBLAS was a red herring. A PCRE library in MacPorts (pcre 8.3.3) was causing the regex to fail via ismatch() and falsely thinking OpenBLAS was not 64bit capable even when it was.

I'm not sure if this is related to any of the PCRE issues.

staticfloat commented 11 years ago

Nice job hunting this all down, @cmcbride! I'm submitting a pull request to ensure that you've linked against a relatively recent version of libpcre right now.

nolta commented 11 years ago

@cmcbride Did you mean "pcre 8.33" instead of "8.3.3"?

vtjnash commented 11 years ago

MacPorts pcre is 8.33, so @staticfloat 's patch probably won't help

staticfloat commented 11 years ago

@vtjnash do you build against MacPorts pcre, or do you build with Julias? I build with Homebrew's pcre, (8.33 as well) and it works just fine.

vtjnash commented 11 years ago

I build against Julia's pcre. The significant difference is probably their unicode support: homebrew:

    system "./configure", "--disable-dependency-tracking",
                          "--prefix=#{prefix}",
                          "--enable-utf8",
                          "--enable-unicode-properties",
                          "--enable-pcregrep-libz",
                          "--enable-pcregrep-libbz2",
                          "--enable-jit"

macports:

configure.args      --docdir=${prefix}/share/doc/${name} \
                    --disable-silent-rules \
                    --enable-pcre16 \
                    --enable-pcre32 \
                    --enable-jit \
                    --enable-unicode-properties \
                    --enable-pcregrep-libz \
                    --enable-pcregrep-libbz2 \
                    --enable-pcretest-libedit
staticfloat commented 11 years ago

Wow, we are on the same wavelength. That's great. I'd like to confirm that it is the unicode stuff, if possible, because we can test for that, and I can add it in to my pull request.

vtjnash commented 11 years ago

Looking closer into this, the --enable-utf8 option only exists for historical reasons and is actually the default and only option. --enable-pcre16 builds a different library named libpcre16 which doesn't affect libpcre. I'm back to being stumped.

nolta commented 11 years ago

This is a strange bug. If i install the pcre macport, and set LD_LIBRARY_PATH=/opt/local/lib,

julia> ismatch(r".*USE64BITINT.*", Base.openblas_get_config())
true

julia> Base.USE_BLAS64
true

julia> Base.check_blas()
ERROR: OpenBLAS was not built with 64bit integer support.
You're seeing this error because Julia was built with USE_BLAS64=1
Please rebuild Julia with USE_BLAS64=0
Quitting.
vtjnash commented 11 years ago

In 1a17b6425e4727c074ea0dd7e18f122341444ad3, I've prioritized libraries found in Julia's lib directory over anything in the user's environment.

cmcbride commented 11 years ago

yes, I meant 8.33.

I agree with @nolta and get the same strangeness with ismatch() and Base.check_blas()

It has to be something with PCRE, but something fishy is happening as applying the same logic in the REPL differs from that in Base.check_blas()

I just compiled with 1a17b6425e4727c074ea0dd7e18f122341444ad3 that @vtjnash pushed. It did not solve the issue (LD_LIBRARY_PATH was searched first, it appears).

cmcbride commented 11 years ago

p.s. Should this now be a new issue?

vtjnash commented 11 years ago

I'm a little surprised that recent commit did not solve the issue for you. It worked on my machine. Can you confirm that your DL_LOAD_PATH looks like mine:

julia> DL_LOAD_PATH
2-element Union(UTF8String,ASCIIString) Array:
 "@executable_path/../lib"      
 "@executable_path/../lib/julia"
cmcbride commented 11 years ago

Sure. Seems to be the same

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.0-prerelease+2876
 _/ |\__'_|_|_|\__'_|  |  Commit 1a17b64* 2013-07-29 20:37:41 UTC
|__/                   |  x86_64-apple-darwin10

julia> DL_LOAD_PATH
2-element Union(ASCIIString,UTF8String) Array:
 "@executable_path/../lib"
 "@executable_path/../lib/julia"
julia> Base.check_blas()
ERROR: OpenBLAS was not built with 64bit integer support.
You're seeing this error because Julia was built with USE_BLAS64=1
Please rebuild Julia with USE_BLAS64=0

It's probably related to the PCRE strangeness, but versioninfo() also breaks with this config. Whatever it is probably breaks a lot of things!

julia> versioninfo()
ERROR: invalid build identifier: ""
 in VersionNumber at version.jl:30
 in print at version.jl:42
 in print_to_string at string.jl:23
 in versioninfo at util.jl:262
 in versioninfo at no file
staticfloat commented 11 years ago

The fact that this only happens in code in Base and not in code in the REPL almost makes me think that something is screwed up in the Base namespace but not in the Main namespace. I'm not sure how that could happen, but that's what it feels like.

@vtjnash; while writing the check_pcre() stuff, I thought it would be neat if we could turn the symbols passed to ccall() into paths. That way, we could print out the result from something such as library_path( :libpcre ) and it would tell us what file on disk is actually being loaded. This could short-cut a lot of work, and would be nice so users don't have to know how to do stuff like lsof -p $(pgrep julia), etc....

This could be a good excuse to write something like this. I was going to myself, but I'm a little busy, and I'm not sure where the best place is to put something like this. (E.g. keeping track of another std::map in dlload.c, or put it into ccall with the other layers about sonames, etc....)

nolta commented 11 years ago

@cmcbride Can you try again w/ the latest master?

staticfloat commented 11 years ago

@nolta's solution notwithstanding, I changed my homebrew compilation options to be identical to Macports (modulo the --prefix option, and also forgoing superenv via --env=std) and I can't seem to get this behavior, so I can't figure out what causes it.

cmmp commented 11 years ago

Hi. I just tried compiling julia from master and I'm getting

ERROR: OpenBLAS was not built with 64bit integer support.
You're seeing this error because Julia was built with USE_BLAS64=1
Please rebuild Julia with USE_BLAS64=0
Quitting.

on OS X 10.8.4

if I run LD_LIBRARY_PATH="" ./julia it loads just fine:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.0-prerelease+2909
 _/ |\__'_|_|_|\__'_|  |  Commit master/6b0ccb4* 2013-07-31 01:49:22 UTC
|__/                   |  x86_64-apple-darwin12.4.0

Is it a bug or do I have to do something?

make is telling me this:

OpenBLAS build complete.

  OS               ... Darwin
  Architecture     ... x86_64
  BINARY           ... 64bit
  Use 64 bits int    (equivalent to "-i8" in Fortran)
  C compiler       ... CLANG  (command line : clang -mmacosx-version-min=10.6)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
-n   Library Name     ... libopenblas_sandybridgep-r0.2.7.a
 (Multi threaded; Max num-threads is 128)
WARNING: If you plan to use the dynamic library libopenblas_sandybridgep-r0.2.7.dylib, you must run:

"make PREFIX=/your_installation_path/ install".

(or set PREFIX in Makefile.rule and run make install.
If you want to move the .dylib to a new location later, make sure you change
the internal name of the dylib with:

install_name_tool -id /new/absolute/path/to/libopenblas_sandybridgep-r0.2.7.dylib libopenblas_sandybridgep-r0.2.7.dylib

To install the library, you can run "make PREFIX=/path/to/your/installation install".

but I'm afraid installing to /usr/local is going to mess up homebrew's openblas.

cmcbride commented 11 years ago

@nolta works great on 6b0ccb446469dfa19c65096a7dfa291f8887b6c4 which includes your fix at 27451c83052a9e0831ab45c998e2602b05e72625 (I confirmed LD_LIBRARY_PATH was set to the MacPorts as before when it caused a problem)

@staticfloat I have no idea why the PCRE is causing things to fail, nor how. Is the homebrew version the same (8.33?) And as @nolta and I both saw, the behavior was erratic!

cmcbride commented 11 years ago

@cmmp is there another OpenBLAS library that is in the LD_LIBRARY_PATH directory that might be loading (and isn't 64bit)? This could be what you're preventing by setting the LD_LIBRARY_PATH to be empty.

Alternatively, you might be seeing a false positive due to the PCRE library, like what I saw.

You can try the same trick I did to diagnose, that is comment out the "quit()" about line 254 in base/utils.jl, re-"make" and then run julia how it "failed" before. Look at versioninfo() and see if the OpenBLAS lib really is 32bit.

cmmp commented 11 years ago

Cameron,

I figured out a way. I was symlinking the julia executable to /usr/local/bin. After I removed the link and added ~/julia/ to my path, julia loads fine.

Strange thing is it worked before in /usr/local.

Thanks, Cássio Em 30/07/2013 23:56, "Cameron McBride" notifications@github.com escreveu:

@cmmd https://github.com/cmmd is there another OpenBLAS library that is in the LD_LIBRARY_PATH directory that might be loading (and isn't 64bit)? This could be what you're removing.

Alternatively, you might be seeing a false positive with the PCRE library, like what I did.

You can try the same trick I did to diagnose, that is comment out the "quit()" about line 254 in base/utils.jl, re-"make" and then run julia how it "failed" before. Look at versioninfo() and see if the OpenBLAS lib really is 32bit.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/3838#issuecomment-21837301 .

stevengj commented 11 years ago

I just the same error with a fresh make clean && make build of git master on MacOS X 10.8.4.

staticfloat commented 11 years ago

Can you list details of the error? It might even be better to open a new issue, this one is a little convoluted.

stevengj commented 11 years ago

It looks like the same error, so it probably belongs in the same issue? Just tried again with make cleanall then make on git master of Julia, and running julia gives:

$ julia
ERROR: OpenBLAS was not built with 64bit integer support.
You're seeing this error because Julia was built with USE_BLAS64=1
Please rebuild Julia with USE_BLAS64=0
Quitting.

I'm using OSX 10.8.4 on a 2013 Mac Pro, and gcc --version gives i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1. Julia is building deps/openblas-v0.2.8.

pao commented 11 years ago

It looks like the same error, so it probably belongs in the same issue?

But different circumstances, and this issue has been closed for some time including a committed patch. You can always crosslink to this issue if you think it is relevant.

stevengj commented 11 years ago

It was only closed a few days ago, and apparently the patch that closed it is broken because exactly the same build-system failure now occurs on a different OSX version. This is an obvious case for a "reopen" rather than "new issue", because the people that were concerned with the original issue should be involved in the new fix.

staticfloat commented 11 years ago

Well, at least we have a hint as to what might be causing the problem. Where is your libpcre coming from? Are you linking against a system-provided libpcre, or Julia's?

Can you compile this small test file against OpenBLAS and see what openblas_get_config() is actually returning?

$ cat openblas_test.c
#include <stdio.h>

extern const char * openblas_get_config( void );

int main( void ) {
        printf("%s\n", openblas_get_config() );
        return 0;
}
$ gcc -o openblas_test openblas_test.c deps/openblas-v0.2.8/libopenblas.a
$ ./openblas_test
USE64BITINT NO_AFFINITY 
stevengj commented 11 years ago

I have a libpcre in /usr/local/lib; this is pcre version 8.33 installed by brew (from some other dependency; I wasn't intending to replace Julia's).

Your test program prints USE64BITINT DYNAMIC_ARCH NO_AFFINITY, but that was when I built with make USE_BLAS64=1 in a (futile) attempt to get things working; let me rebuild with the default options....

stevengj commented 11 years ago

make cleanall && rm -rf deps/openblas* && make gives the same runtime error, and @staticfloat's test program prints USE64BITINT DYNAMIC_ARCH NO_AFFINITY.

stevengj commented 11 years ago

Looks like PCRE is the source of the error, just as above, but that the fix isn't working.

staticfloat commented 11 years ago

If you strace ./julia | grep pcre, what libpcre does it end up loading in? Do you have a USE_SYSTEM_PCRE=1 anywhere? (Like in Make.user, for instance)

EDIT: OSX doesn't have strace, use dtruss: dtruss -f ./julia should work

vtjnash commented 11 years ago

I strongly believe that homebrew should not be installed in /usr/local, although I'm not sure why the build of libpcre seems so sensitive.

The eventual patch should have forced the usage of the libpcre in julia/usr/lib (if it could be loaded)

stevengj commented 11 years ago

grepping the dtruss output for pcre gives:

59708/0x9342e:  stat64("/Users/stevenj/Code/julia/./../lib/julia/libpcre\0", 0x7FFF5FBF8D10, 0x7FFF5FBF9C70)         = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/usr/bin/../lib/julia/libpcre\0", 0x7FFF5FBF8CB0, 0x7FFF5FBF9C70)       = -1 Err#2
59708/0x9342e:  stat64("@executable_path/../lib/julia/libpcre\0", 0x7FFF5FBF8D60, 0x7FFF5FBF9C70)        = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/./../lib/julia/libpcre.dylib\0", 0x7FFF5FBF8D10, 0x7FFF5FBF9C70)       = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/usr/bin/../lib/julia/libpcre.dylib\0", 0x7FFF5FBF8CA0, 0x7FFF5FBF9C70)         = -1 Err#2
59708/0x9342e:  stat64("@executable_path/../lib/julia/libpcre.dylib\0", 0x7FFF5FBF8D60, 0x7FFF5FBF9C70)      = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/./../lib/libpcre\0", 0x7FFF5FBF8D20, 0x7FFF5FBF9C70)       = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/usr/bin/../lib/libpcre\0", 0x7FFF5FBF8CC0, 0x7FFF5FBF9C70)         = -1 Err#2
59708/0x9342e:  stat64("@executable_path/../lib/libpcre\0", 0x7FFF5FBF8D60, 0x7FFF5FBF9C70)      = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/./../lib/libpcre.dylib\0", 0x7FFF5FBF8D10, 0x7FFF5FBF9C70)         = -1 Err#2
59708/0x9342e:  stat64("/Users/stevenj/Code/julia/usr/bin/../lib/libpcre.dylib\0", 0x7FFF5FBF8CB0, 0x7FFF5FBF9C70)       = 0 0
59708/0x9342e:  open("/Users/stevenj/Code/julia/usr/bin/../lib/libpcre.dylib\0", 0x0, 0x0)

I have USE_SYSTEM_PCRE=0 in my Make.inc, and don't override this anywhere else as far as I can tell.

stevengj commented 11 years ago

From the above, it sure looks like I am linking Julia's PCRE. Perhaps whatever problem is afflicting the homebrew PCRE is also affecting Julia's own build?

vtjnash commented 11 years ago

That seems reasonable to suggest. Do we know what libpcre could be linking against, or perhaps if there is a symbol conflict in the library?

stevengj commented 11 years ago

otool -L libpcre.dylib (the OSX analogue of ldd) gives:

    @rpath/libpcre.dylib (compatibility version 2.0.0, current version 2.1.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
stevengj commented 11 years ago

Nope, it's somehow the libpcre in /usr/local/lib that is causing trouble after all. I manually moved /usr/local/lib/libpcre* somewhere else, and Julia works. brew uninstall pcre also does the trick.

How can we keep Julia from getting confused by this?

cmcbride commented 11 years ago

The previous fix was supposed to correct Julia's confusion. Granted, I know little about Julia internals, but I do not understand how your dtruss output is consistent with the your fix of removing the /usr/local/lib PCRE version.

In any case, I just pulled the latest to see if the library priority still works for my config.

BTW, did anyone confirm that this is a PCRE 8.33 issue? Seems suspicious that @stevengj and I both had problems with the same version of the library (MacPorts / homebrew), and Julia remains bundled with 8.31.

staticfloat commented 11 years ago

I use 8.33 homebrew pcre with no problems. On Aug 6, 2013 12:25 PM, "Cameron McBride" notifications@github.com wrote:

The previous fix was supposed to correct Julia's confusion. Granted, I know little about Julia internals, but I do not understand how your dtrussoutput is consistent with the your fix of removing the /usr/local/lib PCRE version.

In any case, I just pulled the latest to see if the library priority still works for my config.

BTW, did anyone confirm that this is a PCRE 8.33 issue? Seems suspicious that @stevengj https://github.com/stevengj and I both had problems with the same version of the library (MacPorts / homebrew), and Julia remains bundled with 8.31.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/3838#issuecomment-22203567 .

cmcbride commented 11 years ago

I use 8.33 homebrew pcre with no problems.

What OSX version, @staticfloat ?