JuliaCI / julia-buildbot

Buildbot configuration for build.julialang.org
MIT License
18 stars 14 forks source link

Potentially work around https://github.com/JuliaLang/julia/issues/9185 #10

Closed tkelman closed 9 years ago

tkelman commented 9 years ago

This might be a x87-vs-SSE problem, since setting march to i686 for building the deps but JULIA_CPU_TARGET to pentium4 for JIT code appears to work properly. Need to verify that this doesn't bring back any of the i686 issues on 32 bit linux though.

staticfloat commented 9 years ago

Interesting idea. I have the feeling this is going to break open{libm,specfun}, but it's worth a shot. I'm going to build this manually on my 32-bit linux worker and see what's what.

tkelman commented 9 years ago

We don't absolutely need to apply the same fix for both Windows and Linux since I think Linux 32 has not been seeing these suitesparse segfaults, but for sanity's sake it might be cleaner to keep things consistent if it works everywhere.

I'd need to look into this in more detail, but I had thought GCC, at least in MinGW configuration, defaults to a fairly conservative instruction set if you don't tell it otherwise. So -march=pentium4 may actually be using more modern instructions than it would if you didn't set the option. It's been rare for me to see "illegal instruction" type errors when moving binaries around between machines unless there's a VM involved, or we're talking low-level libraries written in assembly like openblas.

staticfloat commented 9 years ago

I'm not sure about gcc's default (non-optimized) behavior, but dependencies definitely pass flags like -O3 to gcc which causes the latest and greatest instructions to be used. Without -march passed to GCC, we will get up to SSE 4.2 included from buildbots.

nalimilan commented 9 years ago

@staticfloat Are you sure? The man page doesn't state that -O3 changes anything related to -march. It seems to me that by default gcc creates code that will work on all i386 CPUs, though tuned to be more efficient on newer CPUs if possible. And SuiteSparse does not appear to set any specific MARCH in its Makefiles.

staticfloat commented 9 years ago

Looks like I'm wrong here.

gcc -dumpmachine gives you the target triplet, the first triplet of which is the target march. (At least, for gcc >= 4.5) This means that we should be, by default, emitting only up to SSE2 on 64-bit, and no SSE at all on 32-bit.

In other good news, passing MARCH=i686 JULIA_CPU_TARGET=pentium4 works splendidly on the buildbot I tested it on, for both master and release-0.3. I'm merging this.

tkelman commented 9 years ago

Thanks, sounds good. I was under the assumption that -O3 mostly has to do with internal optimization passes and transformations at the compiler IR level, rather than which instruction sets it's allowed to emit. But I also don't know that much about compiler internals.

Anyway, I think we should rebuild the 0.3.3 binaries for win32 with these modified flags to at least fix the segfault. I still am not sure why changes at the Julia code level would trigger those segfaults, but at least this works around it.

staticfloat commented 9 years ago

I'll rebuild everything to ensure that we get our nice banners and such on all platforms. This will also be a good test to see if the "don't update status.julialang.org" logic is working properly!

tkelman commented 9 years ago

Oh boy. Just don't tell any normal people the things we find exciting.

staticfloat commented 9 years ago

Literally laughed out loud at that one.

tkelman commented 9 years ago

How do we specify that the dependencies (at least suitesparse, but probably others too?) need to be cleaned and rebuilt for this kind of change?

staticfloat commented 9 years ago

So far, I've been SSH'ing in and cleaning them myself. The alternative is one of:

staticfloat commented 9 years ago

Alright, you wanna test http://julianightlies.s3.amazonaws.com/bin/winnt/x64/0.3/julia-0.3.3-b24213b893-win64.exe and see if it works?

tkelman commented 9 years ago

sure though the issue this is solving was win32-only

staticfloat commented 9 years ago

I THINK I cleaned out suite-sparse recently enough (I did it just after I merged this PR) so this should work.

staticfloat commented 9 years ago

sure though the issue this is solving was win32-only

Yes, but we changed a few other things, such as MARCH/JULIA_CPU_TARGET which impacts the 32-bit linux tarball, as well as the banner, which impacts everything except OSX. :P Call me OCD, but I like to have everything self-consistent as much as possible.

tkelman commented 9 years ago

And i think i prefer the latter option, making extra builders for the sake of cleaning seems too complicated and I think you'd need something like number of deps times number of platforms different options.

tkelman commented 9 years ago

the win32 equivalent of that download did not pass the sparse test, so maybe suitesparse was not completely cleaned out? if you used make -C deps distclean-suitesparse when you were on master, then I don't think it'll delete the different version of the dependency that release-0.3 uses

staticfloat commented 9 years ago

Alright. I'll rebuild and try again.

staticfloat commented 9 years ago

Build up: http://julianightlies.s3.amazonaws.com/bin/winnt/x86/0.3/julia-0.3.3-b24213b893-win32.exe

tkelman commented 9 years ago

Lovely.

  | | |_| | | | (_| |  |  Version 0.3.3 (2014-11-23 20:19 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  i686-w64-mingw32

julia> Base.runtests("sparse")
     * sparse
    SUCCESS

What's best to do about the stable links and the binaries at s3.amazonaws.com/julialang ? Make a backup of the original 0.3.3 binaries and put the new ones in their place?

staticfloat commented 9 years ago

Fantastic. I've backed up all the old .exe and .tar.gz files by adding the suffix .premarchfix and uploaded the new binaries in their place. Everything should be good to go at this point.

tkelman commented 9 years ago

Looks like we can actually revert this change now. I haven't benchmarked, but presumably the dependency libraries should run a little faster if allowed to use SSE2 now that Jameson fixed the alignment problem that was causing the original issue.