Closed nalimilan closed 10 years ago
Don't know if this is helpful, but on my 32-bit platform, test/numbers.jl
passes.
$ ./julia -e 'versioninfo(true)'
Julia Version 0.3.0-prerelease+3590
Commit c71c57a* (2014-06-09 22:47 UTC)
Platform Info:
System: Linux (i686-redhat-linux)
CPU: Genuine Intel(R) CPU T2250 @ 1.73GHz
WORD_SIZE: 32
"Fedora release 19 (Schrödinger’s Cat)"
uname: Linux 3.14.4-100.fc19.i686.PAE #1 SMP Tue May 13 15:18:40 UTC 2014 i686 i686
Memory: 1.9599456787109375 GB (87.79296875 MB free)
Uptime: 174474.0 sec
Load Avg: 1.76953125 1.64794921875 1.61572265625
Genuine Intel(R) CPU T2250 @ 1.73GHz:
speed user nice sys idle irq
#1 1733 MHz 1568677 s 603019 s 539046 s 13823841 s 329 s
#2 1733 MHz 1535241 s 297080 s 494382 s 212487 s 0 s
BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm
Environment:
MANPATH = /usr/local/src/texlive/texlive/2013/texmf/doc/man:/usr/local/src/texlive/texlive/2013/texmf/doc/man::/usr/lib/alliance/man:/usr/local/share/man:/usr/share/man:/usr/lib/alliance/man:/usr/lib/erlang/man:/usr/lib/alliance/man:/usr/lib/erlang/man
TERM = xterm-256color
LD_LIBRARY_PATH = /usr/lib/alliance/lib:/usr/lib/alliance/lib
PATH = /usr/local/src/texlive/texlive/2013/bin/i386-linux:/usr/local/src/texlive/texlive/2013/bin/i386-linux:/usr/lib/qt-3.3/bin:/usr/lib/qtchooser:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/usr/lib/alliance/bin:/usr/kerberos/sbin:/usr/local/sbin:/sbin:/usr/sbin:/home/rick/bin:/usr/lib/alliance/bin:/usr/kerberos/sbin:/usr/local/sbin:/sbin:/usr/sbin
MODULEPATH = /usr/share/Modules/modulefiles:/etc/modulefiles
HOME = /home/rick
PYTHONPATH = /usr/local/lib/python2.7/site-packages
MODULESHOME = /usr/share/Modules
INFOPATH = /usr/local/src/texlive/texlive/2013/texmf/doc/info:/usr/share/info:/usr/local/src/texlive/texlive/2013/texmf/doc/info:/usr/share/info:
WINDOWPATH = 1
QT_PLUGIN_PATH = /usr/lib64/kde4/plugins:/usr/lib/kde4/plugins
Package Directory: /home/rick/.julia/v0.3
23 required packages:
- ASCIIPlots 0.0.2
- Blocks 0.0.4
- Calendar 0.4.1
- Clp 0.0.7
- DSP 0.0.2
- DataFrames 0.5.4+ master
- DataStructures 0.2.14
- Distributions 0.4.7
- GLPK 0.2.11
- GLPKMathProgInterface 0.1.4
- Gadfly 0.2.9
- Gaston 0.0.0
- IJulia 0.1.11
- ImmutableArrays 0.0.4
- Ipopt 0.1.1+ master
- JuMP 0.5.1+ master
- Optim 0.2.0
- ProfileView 0.0.2
- PyPlot 1.2.7
- RDatasets 0.1.1
- SortingAlgorithms 0.0.1
- TestImages 0.0.5
- Winston 0.11.0
40 additional packages:
- ArrayViews 0.4.4
- BinDeps 0.2.12
- Cairo 0.2.13
- Calculus 0.1.3
- Cartesian 0.1.5
- Cbc 0.0.7
- Codecs 0.1.0
- Color 0.2.10+ master
- Compose 0.1.29
- DataArrays 0.1.10
- Datetime 0.1.6
- Distance 0.4.0
- DualNumbers 0.1.0
- GZip 0.2.12
- Graphs 0.4.2
- Hexagons 0.0.1
- ICU 0.4.1
- ImageView 0.0.17
- Images 0.2.36
- IniFile 0.2.2
- Iterators 0.1.2
- JSON 0.3.5
- Loess 0.0.2
- MathProgBase 0.1.6
- Nettle 0.1.4
- Options 0.2.2
- PDMats 0.2.0
- Polynomial 0.1.1
- PyCall 0.4.6
- REPLCompletions 0.0.1
- Reexport 0.0.1
- ReverseDiffSparse 0.1.1
- SIUnits 0.0.1
- StatsBase 0.4.0
- TexExtensions 0.0.1
- Tk 0.2.12
- URIParser 0.0.2
- ZMQ 0.1.11
- ZipFile 0.2.1
- Zlib 0.1.7
$ gcc --version
gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ make test-numbers
JULIA test/numbers
* numbers
SUCCESS
Bump.
Would it be possible for you to do a bisect? Also, what is your 32-bit environment? I have setup ubuntu 12.04 32-bit, which I will try to build on in a couple of days.
Yeah, I can do a bisect, but it's relatively slow so a few hints about e.g. where to start would be useful. Tell me if you don't manage to reproduce the problem on your setup and I'll try bisecting this.
The build environment is Fedora 19, 20 and Rawhide in a VM, with gcc 4.8.2 and 4.9.0.
@nalimilan My build on 32-bit dual core Fedora 19 passes all the tests so I'm wondering what would be different about your build environment?
@rickhg12hs Interesting. Maybe the fact that I set USE_SYSTEM_OPENLIBM=1
? It shouldn't make a difference, but... Or maybe that I'm setting JULIA_CPU_TARGET=i386
? Could you try that on your machine?
Setting JULIA_CPU_TARGET=i386
in Make.user
was enough to reproduce your error. The setting of USE_SYSTEM_OPENLIBM
didn't seem to make a difference.
N.B.: After editing Make.user
, all I did was make clean && make && make test-numbers
. I didn't remake dependencies, etc., nor do I know if they should be.
I guess this is a codegen issue then. Cc @Keno @vtjnash @JeffBezanson
As far as I can tell, this is due to a loss of precision on i387, since the result is very slightly off:
julia> (Complex(1,2) / Complex(2.5,3.0)) * Complex(2.5,3.0)
0.9999999999999999 + 1.9999999999999998im
Quite possible.
Confirmed that without JULIA_CPU_TARGET=i386
it works. It it changes anything, it would be possible to support i686
instead, since apparently (at least for Fedora) that's the build target which is used.
I've tried using a i686
target instead of i386
by adapting the patch at #7103, but the bug remains.
AFAICT, i386, i486, and i686 all map to the same processor configuration in llvm
@vtjnash OK. I've tried with pentium4
, and then it works. Not sure this gives much information for debugging, but that would make a reasonable base requirement instead of i386
.
We can just make that an approximate test for now. I am seeing if all the other tests pass before committing.
Next one after fixing the tolerance for this one:
* arrayops
* reduce
exception on 1: ERROR: test failed: sum(sin,z) == sum(sin,fz) == sum(sin(fz))
in error at error.jl:21
in default_handler at test.jl:19
in do_test at test.jl:39
in runtests at /home/vagrant/julia/test/testdefs.jl:5
in anonymous at multi.jl:652
in run_work_thunk at multi.jl:613
in remotecall_fetch at multi.jl:686
in remotecall_fetch at multi.jl:701
in anonymous at task.jl:1348
while loading reduce.jl, in expression starting on line 45
ERROR: test failed: sum(sin,z) == sum(sin,fz) == sum(sin(fz))
in error at error.jl:21
in default_handler at test.jl:19
in do_test at test.jl:39
in runtests at /home/vagrant/julia/test/testdefs.jl:5
in anonymous at multi.jl:652
in run_work_thunk at multi.jl:613
in remotecall_fetch at multi.jl:686
in remotecall_fetch at multi.jl:701
in anonymous at task.jl:1348
while loading reduce.jl, in expression starting on line 45
while loading /home/vagrant/julia/test/runtests.jl, in expression starting on line 46
We can just make that an approximate test for now. I am seeing if all the other tests pass before committing.
Not sure that's a good idea: even if the difference is small, it can be slightly confusing, and doesn't it even mean that some equality tests may fail while they work elsewhere? All of that just to support i386 machines nobody will use. :-/
@sebastien-villemot Do you need to support real i486s in your package? The Debian docs say they are still supported, but I'm not sure this is really required of all packages, even in cases (like Julia) where it makes little sense. I think for Fedora requiring Pentium 4 should be accepted.
@nalimilan I think Debian still support i486, but I am not sure to understand what are the implications for Julia. In particular, the Debian package of julia uses the LLVM provided by Debian, so I do not have to deal with LLVM configuration options. Maybe there are some issues with openlibm since it contains asm, but I did not check.
Pentium 4 is certainly an acceptable minimum.
@ViralBShah can you just submit the patch with the tolerance.
@sebastien-villemot LLVM is completely unrelated to this issue. So is openlibm. The setting only affects Julia, i.e. what instruction set is used when compiling Julia code.
It would be interesting to know whether Debian would grant exceptions to 486 support or not.
I did not submit this patch because the tests still fail elsewhere. Will check what all needs fixing. I may be able to do this realistically only in Chicago.
@nalimilan My guess is that if Julia produces Pentium 4 code, then nobody will notice that it does not run on i486 or i586.
To go back to the initial topic of this issue, I have also experienced precision problems on some 32-bit machines. AFAIK these are due to the fact that true 32-bit machines have a coprocessor (x87) that gives results different from those obtained on a 64-bit processor with SSE2. The core reason is that the x87 has 80-bit internal registers for extra precision.
@sebastien-villemot OK, I'm fine with P4, we could replace i386
with pentium4
then if it makes things simpler and code more efficient thanks to SSE2. But when you say you've seen problems on 32-bit machines, was it with JULIA_CPU_TARGET=i386
?
@nalimilan The problem was with Julia 0.2, which apparently does not provide this selection mechanism.
Also, if you switch the default on 32-bit to Pentium 4, please keep the i386 target available, so that I can use it on Debian i386 arch (which, despite its name, requires at least i486).
Also, if you switch the default on 32-bit to Pentium 4, please keep the i386 target available, so that I can use it on Debian i386 arch (which, despite its name, requires at least i486).
That was precisely what I was asking. So you really need to support i486, which means this bug needs to be fixed. :-/
There is the other nasty openlibm bug on 486 where rem_pio2 is buggy - so for now it would be really nice to avoid these old machines.
Regarding the JULIA_CPU_TARGET
issue, I just realized something: as its name indicates, dSFMT makes use of SIMD and in particular of SSE2 instructions. analyze-x86 reports this:
instructions:
cpuid: 0 nop: 3 call: 0 count: 1447
i686: 1
mmx: 168
sse2: 213
So it's pointless to try making Julia run on anything older than a Pentium4, unless we find a fallback for these machines. For now, I would say that JULIA_CPU_TARGET=i386
should be replaced with JULIA_CPU_TARGET=pentium4
instead of setting JULIA_CPU_TARGET=i486
directly, you can now set ARCH=i486
, which also tries to configure all of the C compilers to generate code for that processor. (https://github.com/JuliaLang/julia#architecture-customization)
@vtjnash That's really nice, but AFAICT dSFMT will still fail on CPUs older than Pentium4. Of course, the probability of somebody actually trying Julia on such a machine is relatively low...
@nalimilan My understanding is that dSFMT has support for non-SSE2 CPUs. There is fallback code for that case (HAVE_SSE2 not defined).
@sebastien-villemot Looks like you're right. Actually, I must be seeing SSE2 instructions just because I'm on 64-bit (cf. [1]), and because the new Fedora dSFMT package hardcodes HAVE_SSE2
. Need to fix that.
1: https://github.com/JuliaLang/julia/blob/master/deps/Makefile#L655
I think the upshot here needs to be that we don't support running Julia in legacy 387 80-bit FPU mode.
@StefanKarpinski Yeah, I think we need to decide whether the minimum requirement will remain SSE 2/Pentium4 or not, and if so change JULIA_CPU_TARGET=i386
to pentium4
.
one option is to simply disable extended precision:
diff --git a/deps/openlibm b/deps/openlibm
--- a/deps/openlibm
+++ b/deps/openlibm
@@ -1 +1 @@
-Subproject commit 0b9d67e54a5b07e32a27b68cf0a01c33b515fc9b
+Subproject commit 0b9d67e54a5b07e32a27b68cf0a01c33b515fc9b-dirty
diff --git a/src/codegen.cpp b/src/codegen.cpp
index fecf976..2cee5a3 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -4457,8 +4457,24 @@ static void init_julia_llvm_env(Module *m)
FPM->doInitialization();
}
+#define FP387_NEAREST 0x0000
+#define FP387_ZERO 0x0C00
+#define FP387_UP 0x0800
+#define FP387_DOWN 0x0400
+
+#define FP387_SINGLE 0x0000
+#define FP387_DOUBLE 0x0200
+#define FP387_EXTENDED 0x0300
+
+static inline void fp387(const unsigned short control)
+{
+ unsigned short cw = (control & 0x0F00) | 0x007f;
+ __asm__ volatile ("fldcw %0" : : "m" (*&cw));
+}
+
extern "C" void jl_init_codegen(void)
{
+ fp387(FP387_DOUBLE | FP387_NEAREST);
#ifdef JL_DEBUG_BUILD
cl::ParseEnvironmentOptions("Julia", "JULIA_LLVM_ARGS");
#endif
although it is unclear how this would affect the system libm, or openlibm for that matter
I don't think we can expect to run Julia on anything older than a Pentium 4. That gives us a good 14 years of hardware to run on, and anyone seriously interested in technical computing is going to have at least a P4 or greater. I think setting MARCH
to pentium4
or higher is the right way to go here.
@staticfloat Probably. In principle, distros want to support any i386, but in practice I guess nobody ever tries a P3 anyway.
FWIW, I came to the conclusion that the 32-bit Debian package has to be compiled for i486. This is a theoretical requirement set by Debian. But it is also a very real constraint, because some Debian autobuilders actually emulate a i486 hardware (using qemu), so the package would not build with ARCH=pentium4
.
I have been trying to make Julia work on non-SSE2 systems, and it turns out to be not so easy. There are a few tests that fail due to rounding issues, but this is not a big deal. A much bigger problems is for FloatRange
, which makes strong assumptions about the rounding methods. For example, on a non-SSE2 processor, 0.1:0.1:0.3
has length 2, which is quite problematic.
Then I tried to implement the workaround suggested by @vtjnash , which is to use double precision (instead of extended precision). It works well for Float64
ranges, and fixes the problem above. There are still some issues for Float32
ranges, probably due to the fact that the Julia code makes the assumption that single precision rounding is used. Another problem is the fact that in the middle of the testsuite, some code resets the precision to extended; I could not figure out where this is done.
In the end, I think this demonstrates that I cannot ship a package for non-SSE2 systems, at least as long as you guys don't want to support this configuration. The conclusion is that I am going to drop the 32-bit Debian package. The only people who will be affected by this move are those whose CPU supports SSE2 but not x86-64, which is a rather small range of hardware (the first Pentium 4). Later CPUs with x86-64 support can simply use the 64-bit package.
People who for some reason have installed a 32-bit OS on a x86-64 are also screwed (though few people should be concerned). In the end the Debian policies are counter-productive: to support machines older than P4, you end up dropping support for more recent machines which may be used in real life. :-/
@nalimilan People who installed Debian 32-bit can still install 64-bit packages thanks to multiarch (provided they switch to a 64-bit kernal). In the precise case of Julia, however, this is not yet possible, at least because the BLAS/LAPACK packages are not multiarch-ready. But this is something that is fixable.
The best solution is probably just to loosen the necessary tests slightly. We aren't learning anything particularly new here, just rehashing an old observation that compilers may rearrange your math operations to do them more efficiently. And thus we are just showing that the system gcc compiler sometimes loses a bit of precision in doing so. It's rather unfortunate that llvm seems to have simply copied gcc behavior in this regard, although perhaps this is just an unfortunate limitation of the i387 hardware.
I can reproduce the range issues my i486 cross-compile. analyze-x86 (find julia/usr -name \*.so | xargs -n 1 -t -I{} ~/analyze-x86/analyze-x86 {}
) seems to confirm the cross-compile was successful and that the remaining sse2/avx instructions are likely to be from either dead-code or mis-detection, since there are so few.
@vtjnash As I have written in a previous comment, it's not just about loosening the testsuite. It's also about modifying FloatRange
which is completely broken on x87.
Also note that I raised the i486/SSE2 issue on Debian lists, and the consensus seems to be that it's ok to have Julia require SSE2 on 32-bit, as long as it gives an explicit error message when SSE2 is not present.
So I'm finally going to ship a 32-bit package, and add a patch that tests at runtime for the presence of SSE2, and gracefully exit if it is not there.
@staticfloat is this superseded by #8731? I think the failures mentioned in this issue have been resolved by loosening tolerances. At least on the PPA there was a case of a user who had a pre-pentium4 i686. Our tests might not be catching every single case of old 32-bit machines giving slightly different floating-point results relative to newer processors, but I think we might have to live with that?
Yes, I agre .
I'm seeing a failure on 32-bit (64-bit works fine) when building Julia RPM package based on git master from June 6th. This is a regression. Any ideas about commits which could have broken this so I try a bisection? Or commands that I should run?
Note this is with LLVM 3.3.