JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.92k stars 5.49k forks source link

numbers test fails on 32-bit with JULIA_CPU_TARGET=i386 #7185

Closed nalimilan closed 10 years ago

nalimilan commented 10 years ago

I'm seeing a failure on 32-bit (64-bit works fine) when building Julia RPM package based on git master from June 6th. This is a regression. Any ideas about commits which could have broken this so I try a bisection? Or commands that I should run?

Note this is with LLVM 3.3.

+ make all
Warning: git information unavailable; versioning information limited
    JULIA test/all
    From worker 2:       * linalg1
    From worker 3:       * linalg2
    From worker 4:       * linalg3
    From worker 4:       * linalg4
    From worker 4:       * core
    From worker 4:       * keywordargs
    From worker 4:       * numbers
exception on 4: ERROR: test failed: (Complex(1,2) / Complex(2.5,3.0)) * Complex(2.5,3.0) == Complex(1,2)
 in error at error.jl:21
 in default_handler at test.jl:19
 in do_test at test.jl:39
 in runtests at /builddir/build/BUILD/julia-master/test/testdefs.jl:5
 in anonymous at multi.jl:847
 in run_work_thunk at multi.jl:613
 in anonymous at task.jl:847
while loading numbers.jl, in expression starting on line 798
ERROR: test failed: (Complex(1,2) / Complex(2.5,3.0)) * Complex(2.5,3.0) == Complex(1,2)
 in anonymous at task.jl:1350
while loading numbers.jl, in expression starting on line 798
while loading /builddir/build/BUILD/julia-master/test/runtests.jl, in expression starting on line 46
make: *** [all] Error 1
rickhg12hs commented 10 years ago

Don't know if this is helpful, but on my 32-bit platform, test/numbers.jl passes.

$ ./julia -e 'versioninfo(true)'
Julia Version 0.3.0-prerelease+3590
Commit c71c57a* (2014-06-09 22:47 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
           "Fedora release 19 (Schrödinger’s Cat)"
  uname: Linux 3.14.4-100.fc19.i686.PAE #1 SMP Tue May 13 15:18:40 UTC 2014 i686 i686
Memory: 1.9599456787109375 GB (87.79296875 MB free)
Uptime: 174474.0 sec
Load Avg:  1.76953125  1.64794921875  1.61572265625
Genuine Intel(R) CPU           T2250  @ 1.73GHz: 
       speed         user         nice          sys         idle          irq
#1  1733 MHz    1568677 s     603019 s     539046 s   13823841 s        329 s
#2  1733 MHz    1535241 s     297080 s     494382 s     212487 s          0 s

  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm
Environment:
  MANPATH = /usr/local/src/texlive/texlive/2013/texmf/doc/man:/usr/local/src/texlive/texlive/2013/texmf/doc/man::/usr/lib/alliance/man:/usr/local/share/man:/usr/share/man:/usr/lib/alliance/man:/usr/lib/erlang/man:/usr/lib/alliance/man:/usr/lib/erlang/man
  TERM = xterm-256color
  LD_LIBRARY_PATH = /usr/lib/alliance/lib:/usr/lib/alliance/lib
  PATH = /usr/local/src/texlive/texlive/2013/bin/i386-linux:/usr/local/src/texlive/texlive/2013/bin/i386-linux:/usr/lib/qt-3.3/bin:/usr/lib/qtchooser:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/usr/lib/alliance/bin:/usr/kerberos/sbin:/usr/local/sbin:/sbin:/usr/sbin:/home/rick/bin:/usr/lib/alliance/bin:/usr/kerberos/sbin:/usr/local/sbin:/sbin:/usr/sbin
  MODULEPATH = /usr/share/Modules/modulefiles:/etc/modulefiles
  HOME = /home/rick
  PYTHONPATH = /usr/local/lib/python2.7/site-packages
  MODULESHOME = /usr/share/Modules
  INFOPATH = /usr/local/src/texlive/texlive/2013/texmf/doc/info:/usr/share/info:/usr/local/src/texlive/texlive/2013/texmf/doc/info:/usr/share/info:
  WINDOWPATH = 1
  QT_PLUGIN_PATH = /usr/lib64/kde4/plugins:/usr/lib/kde4/plugins

Package Directory: /home/rick/.julia/v0.3
23 required packages:
 - ASCIIPlots                    0.0.2
 - Blocks                        0.0.4
 - Calendar                      0.4.1
 - Clp                           0.0.7
 - DSP                           0.0.2
 - DataFrames                    0.5.4+             master
 - DataStructures                0.2.14
 - Distributions                 0.4.7
 - GLPK                          0.2.11
 - GLPKMathProgInterface         0.1.4
 - Gadfly                        0.2.9
 - Gaston                        0.0.0
 - IJulia                        0.1.11
 - ImmutableArrays               0.0.4
 - Ipopt                         0.1.1+             master
 - JuMP                          0.5.1+             master
 - Optim                         0.2.0
 - ProfileView                   0.0.2
 - PyPlot                        1.2.7
 - RDatasets                     0.1.1
 - SortingAlgorithms             0.0.1
 - TestImages                    0.0.5
 - Winston                       0.11.0
40 additional packages:
 - ArrayViews                    0.4.4
 - BinDeps                       0.2.12
 - Cairo                         0.2.13
 - Calculus                      0.1.3
 - Cartesian                     0.1.5
 - Cbc                           0.0.7
 - Codecs                        0.1.0
 - Color                         0.2.10+            master
 - Compose                       0.1.29
 - DataArrays                    0.1.10
 - Datetime                      0.1.6
 - Distance                      0.4.0
 - DualNumbers                   0.1.0
 - GZip                          0.2.12
 - Graphs                        0.4.2
 - Hexagons                      0.0.1
 - ICU                           0.4.1
 - ImageView                     0.0.17
 - Images                        0.2.36
 - IniFile                       0.2.2
 - Iterators                     0.1.2
 - JSON                          0.3.5
 - Loess                         0.0.2
 - MathProgBase                  0.1.6
 - Nettle                        0.1.4
 - Options                       0.2.2
 - PDMats                        0.2.0
 - Polynomial                    0.1.1
 - PyCall                        0.4.6
 - REPLCompletions               0.0.1
 - Reexport                      0.0.1
 - ReverseDiffSparse             0.1.1
 - SIUnits                       0.0.1
 - StatsBase                     0.4.0
 - TexExtensions                 0.0.1
 - Tk                            0.2.12
 - URIParser                     0.0.2
 - ZMQ                           0.1.11
 - ZipFile                       0.2.1
 - Zlib                          0.1.7
$ gcc --version
gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ make test-numbers
    JULIA test/numbers
     * numbers
    SUCCESS
nalimilan commented 10 years ago

Bump.

ViralBShah commented 10 years ago

Would it be possible for you to do a bisect? Also, what is your 32-bit environment? I have setup ubuntu 12.04 32-bit, which I will try to build on in a couple of days.

nalimilan commented 10 years ago

Yeah, I can do a bisect, but it's relatively slow so a few hints about e.g. where to start would be useful. Tell me if you don't manage to reproduce the problem on your setup and I'll try bisecting this.

The build environment is Fedora 19, 20 and Rawhide in a VM, with gcc 4.8.2 and 4.9.0.

rickhg12hs commented 10 years ago

@nalimilan My build on 32-bit dual core Fedora 19 passes all the tests so I'm wondering what would be different about your build environment?

nalimilan commented 10 years ago

@rickhg12hs Interesting. Maybe the fact that I set USE_SYSTEM_OPENLIBM=1? It shouldn't make a difference, but... Or maybe that I'm setting JULIA_CPU_TARGET=i386? Could you try that on your machine?

rickhg12hs commented 10 years ago

Setting JULIA_CPU_TARGET=i386 in Make.user was enough to reproduce your error. The setting of USE_SYSTEM_OPENLIBM didn't seem to make a difference.

N.B.: After editing Make.user, all I did was make clean && make && make test-numbers. I didn't remake dependencies, etc., nor do I know if they should be.

ViralBShah commented 10 years ago

I guess this is a codegen issue then. Cc @Keno @vtjnash @JeffBezanson

vtjnash commented 10 years ago

As far as I can tell, this is due to a loss of precision on i387, since the result is very slightly off:

julia> (Complex(1,2) / Complex(2.5,3.0)) * Complex(2.5,3.0)
0.9999999999999999 + 1.9999999999999998im
Keno commented 10 years ago

Quite possible.

nalimilan commented 10 years ago

Confirmed that without JULIA_CPU_TARGET=i386 it works. It it changes anything, it would be possible to support i686 instead, since apparently (at least for Fedora) that's the build target which is used.

nalimilan commented 10 years ago

I've tried using a i686 target instead of i386 by adapting the patch at #7103, but the bug remains.

vtjnash commented 10 years ago

AFAICT, i386, i486, and i686 all map to the same processor configuration in llvm

nalimilan commented 10 years ago

@vtjnash OK. I've tried with pentium4, and then it works. Not sure this gives much information for debugging, but that would make a reasonable base requirement instead of i386.

ViralBShah commented 10 years ago

We can just make that an approximate test for now. I am seeing if all the other tests pass before committing.

ViralBShah commented 10 years ago

Next one after fixing the tolerance for this one:

     * arrayops
     * reduce
exception on 1: ERROR: test failed: sum(sin,z) == sum(sin,fz) == sum(sin(fz))
 in error at error.jl:21
 in default_handler at test.jl:19
 in do_test at test.jl:39
 in runtests at /home/vagrant/julia/test/testdefs.jl:5
 in anonymous at multi.jl:652
 in run_work_thunk at multi.jl:613
 in remotecall_fetch at multi.jl:686
 in remotecall_fetch at multi.jl:701
 in anonymous at task.jl:1348
while loading reduce.jl, in expression starting on line 45
ERROR: test failed: sum(sin,z) == sum(sin,fz) == sum(sin(fz))
 in error at error.jl:21
 in default_handler at test.jl:19
 in do_test at test.jl:39
 in runtests at /home/vagrant/julia/test/testdefs.jl:5
 in anonymous at multi.jl:652
 in run_work_thunk at multi.jl:613
 in remotecall_fetch at multi.jl:686
 in remotecall_fetch at multi.jl:701
 in anonymous at task.jl:1348
while loading reduce.jl, in expression starting on line 45
while loading /home/vagrant/julia/test/runtests.jl, in expression starting on line 46
nalimilan commented 10 years ago

We can just make that an approximate test for now. I am seeing if all the other tests pass before committing.

Not sure that's a good idea: even if the difference is small, it can be slightly confusing, and doesn't it even mean that some equality tests may fail while they work elsewhere? All of that just to support i386 machines nobody will use. :-/

nalimilan commented 10 years ago

@sebastien-villemot Do you need to support real i486s in your package? The Debian docs say they are still supported, but I'm not sure this is really required of all packages, even in cases (like Julia) where it makes little sense. I think for Fedora requiring Pentium 4 should be accepted.

svillemot commented 10 years ago

@nalimilan I think Debian still support i486, but I am not sure to understand what are the implications for Julia. In particular, the Debian package of julia uses the LLVM provided by Debian, so I do not have to deal with LLVM configuration options. Maybe there are some issues with openlibm since it contains asm, but I did not check.

ViralBShah commented 10 years ago

Pentium 4 is certainly an acceptable minimum.

Keno commented 10 years ago

@ViralBShah can you just submit the patch with the tolerance.

nalimilan commented 10 years ago

@sebastien-villemot LLVM is completely unrelated to this issue. So is openlibm. The setting only affects Julia, i.e. what instruction set is used when compiling Julia code.

It would be interesting to know whether Debian would grant exceptions to 486 support or not.

ViralBShah commented 10 years ago

I did not submit this patch because the tests still fail elsewhere. Will check what all needs fixing. I may be able to do this realistically only in Chicago.

svillemot commented 10 years ago

@nalimilan My guess is that if Julia produces Pentium 4 code, then nobody will notice that it does not run on i486 or i586.

To go back to the initial topic of this issue, I have also experienced precision problems on some 32-bit machines. AFAIK these are due to the fact that true 32-bit machines have a coprocessor (x87) that gives results different from those obtained on a 64-bit processor with SSE2. The core reason is that the x87 has 80-bit internal registers for extra precision.

nalimilan commented 10 years ago

@sebastien-villemot OK, I'm fine with P4, we could replace i386 with pentium4 then if it makes things simpler and code more efficient thanks to SSE2. But when you say you've seen problems on 32-bit machines, was it with JULIA_CPU_TARGET=i386?

svillemot commented 10 years ago

@nalimilan The problem was with Julia 0.2, which apparently does not provide this selection mechanism.

Also, if you switch the default on 32-bit to Pentium 4, please keep the i386 target available, so that I can use it on Debian i386 arch (which, despite its name, requires at least i486).

nalimilan commented 10 years ago

Also, if you switch the default on 32-bit to Pentium 4, please keep the i386 target available, so that I can use it on Debian i386 arch (which, despite its name, requires at least i486).

That was precisely what I was asking. So you really need to support i486, which means this bug needs to be fixed. :-/

ViralBShah commented 10 years ago

There is the other nasty openlibm bug on 486 where rem_pio2 is buggy - so for now it would be really nice to avoid these old machines.

nalimilan commented 10 years ago

Regarding the JULIA_CPU_TARGET issue, I just realized something: as its name indicates, dSFMT makes use of SIMD and in particular of SSE2 instructions. analyze-x86 reports this:

instructions:
 cpuid: 0    nop: 3  call: 0     count: 1447
 i686:   1
 mmx:    168
 sse2:   213

So it's pointless to try making Julia run on anything older than a Pentium4, unless we find a fallback for these machines. For now, I would say that JULIA_CPU_TARGET=i386 should be replaced with JULIA_CPU_TARGET=pentium4

vtjnash commented 10 years ago

instead of setting JULIA_CPU_TARGET=i486 directly, you can now set ARCH=i486, which also tries to configure all of the C compilers to generate code for that processor. (https://github.com/JuliaLang/julia#architecture-customization)

nalimilan commented 10 years ago

@vtjnash That's really nice, but AFAICT dSFMT will still fail on CPUs older than Pentium4. Of course, the probability of somebody actually trying Julia on such a machine is relatively low...

svillemot commented 10 years ago

@nalimilan My understanding is that dSFMT has support for non-SSE2 CPUs. There is fallback code for that case (HAVE_SSE2 not defined).

nalimilan commented 10 years ago

@sebastien-villemot Looks like you're right. Actually, I must be seeing SSE2 instructions just because I'm on 64-bit (cf. [1]), and because the new Fedora dSFMT package hardcodes HAVE_SSE2. Need to fix that.

1: https://github.com/JuliaLang/julia/blob/master/deps/Makefile#L655

StefanKarpinski commented 10 years ago

I think the upshot here needs to be that we don't support running Julia in legacy 387 80-bit FPU mode.

nalimilan commented 10 years ago

@StefanKarpinski Yeah, I think we need to decide whether the minimum requirement will remain SSE 2/Pentium4 or not, and if so change JULIA_CPU_TARGET=i386 to pentium4.

vtjnash commented 10 years ago

one option is to simply disable extended precision:

diff --git a/deps/openlibm b/deps/openlibm
--- a/deps/openlibm
+++ b/deps/openlibm
@@ -1 +1 @@
-Subproject commit 0b9d67e54a5b07e32a27b68cf0a01c33b515fc9b
+Subproject commit 0b9d67e54a5b07e32a27b68cf0a01c33b515fc9b-dirty
diff --git a/src/codegen.cpp b/src/codegen.cpp
index fecf976..2cee5a3 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -4457,8 +4457,24 @@ static void init_julia_llvm_env(Module *m)
     FPM->doInitialization();
 }

+#define FP387_NEAREST   0x0000
+#define FP387_ZERO      0x0C00
+#define FP387_UP        0x0800
+#define FP387_DOWN      0x0400
+
+#define FP387_SINGLE    0x0000
+#define FP387_DOUBLE    0x0200
+#define FP387_EXTENDED  0x0300
+
+static inline void fp387(const unsigned short control)
+{
+        unsigned short cw = (control & 0x0F00) | 0x007f;
+            __asm__ volatile ("fldcw %0" : : "m" (*&cw));
+}
+
 extern "C" void jl_init_codegen(void)
 {
+    fp387(FP387_DOUBLE | FP387_NEAREST);
 #ifdef JL_DEBUG_BUILD
     cl::ParseEnvironmentOptions("Julia", "JULIA_LLVM_ARGS");
 #endif

although it is unclear how this would affect the system libm, or openlibm for that matter

source: http://stackoverflow.com/questions/17663780/is-there-a-document-describing-how-clang-handles-excess-floating-point-precision

staticfloat commented 10 years ago

I don't think we can expect to run Julia on anything older than a Pentium 4. That gives us a good 14 years of hardware to run on, and anyone seriously interested in technical computing is going to have at least a P4 or greater. I think setting MARCH to pentium4 or higher is the right way to go here.

nalimilan commented 10 years ago

@staticfloat Probably. In principle, distros want to support any i386, but in practice I guess nobody ever tries a P3 anyway.

svillemot commented 10 years ago

FWIW, I came to the conclusion that the 32-bit Debian package has to be compiled for i486. This is a theoretical requirement set by Debian. But it is also a very real constraint, because some Debian autobuilders actually emulate a i486 hardware (using qemu), so the package would not build with ARCH=pentium4.

svillemot commented 10 years ago

I have been trying to make Julia work on non-SSE2 systems, and it turns out to be not so easy. There are a few tests that fail due to rounding issues, but this is not a big deal. A much bigger problems is for FloatRange, which makes strong assumptions about the rounding methods. For example, on a non-SSE2 processor, 0.1:0.1:0.3 has length 2, which is quite problematic.

Then I tried to implement the workaround suggested by @vtjnash , which is to use double precision (instead of extended precision). It works well for Float64 ranges, and fixes the problem above. There are still some issues for Float32 ranges, probably due to the fact that the Julia code makes the assumption that single precision rounding is used. Another problem is the fact that in the middle of the testsuite, some code resets the precision to extended; I could not figure out where this is done.

In the end, I think this demonstrates that I cannot ship a package for non-SSE2 systems, at least as long as you guys don't want to support this configuration. The conclusion is that I am going to drop the 32-bit Debian package. The only people who will be affected by this move are those whose CPU supports SSE2 but not x86-64, which is a rather small range of hardware (the first Pentium 4). Later CPUs with x86-64 support can simply use the 64-bit package.

nalimilan commented 10 years ago

People who for some reason have installed a 32-bit OS on a x86-64 are also screwed (though few people should be concerned). In the end the Debian policies are counter-productive: to support machines older than P4, you end up dropping support for more recent machines which may be used in real life. :-/

svillemot commented 10 years ago

@nalimilan People who installed Debian 32-bit can still install 64-bit packages thanks to multiarch (provided they switch to a 64-bit kernal). In the precise case of Julia, however, this is not yet possible, at least because the BLAS/LAPACK packages are not multiarch-ready. But this is something that is fixable.

vtjnash commented 10 years ago

The best solution is probably just to loosen the necessary tests slightly. We aren't learning anything particularly new here, just rehashing an old observation that compilers may rearrange your math operations to do them more efficiently. And thus we are just showing that the system gcc compiler sometimes loses a bit of precision in doing so. It's rather unfortunate that llvm seems to have simply copied gcc behavior in this regard, although perhaps this is just an unfortunate limitation of the i387 hardware.

I can reproduce the range issues my i486 cross-compile. analyze-x86 (find julia/usr -name \*.so | xargs -n 1 -t -I{} ~/analyze-x86/analyze-x86 {}) seems to confirm the cross-compile was successful and that the remaining sse2/avx instructions are likely to be from either dead-code or mis-detection, since there are so few.

svillemot commented 10 years ago

@vtjnash As I have written in a previous comment, it's not just about loosening the testsuite. It's also about modifying FloatRange which is completely broken on x87.

Also note that I raised the i486/SSE2 issue on Debian lists, and the consensus seems to be that it's ok to have Julia require SSE2 on 32-bit, as long as it gives an explicit error message when SSE2 is not present.

So I'm finally going to ship a 32-bit package, and add a patch that tests at runtime for the presence of SSE2, and gracefully exit if it is not there.

tkelman commented 10 years ago

@staticfloat is this superseded by #8731? I think the failures mentioned in this issue have been resolved by loosening tolerances. At least on the PPA there was a case of a user who had a pre-pentium4 i686. Our tests might not be catching every single case of old 32-bit machines giving slightly different floating-point results relative to newer processors, but I think we might have to live with that?

staticfloat commented 10 years ago

Yes, I agre .