JuliaMath / DoubleDouble.jl

Extended precision arithmetic for Julia (deprecated)
Other
26 stars 16 forks source link

Moving things forward #34

Open saschatimme opened 6 years ago

saschatimme commented 6 years ago

Following the lively discussion on discourse, what do you think is now the best way going forward? I think we all have the same goal: Getting a solid and fast double-double and quad-double precision library. But the question is: How do we want to achieve this?

Here are some more or less unstructured thoughts from my side:

cc @simonbyrne @JeffreySarnoff @dpsanders

JeffreySarnoff commented 6 years ago

Go with FMA, as @simonbyrne has noted, today's Julia will do the right thing (computationally) and where there is hardware FMA (many, many platforms nowadays) it will be as quick as it is accurate. For the systems running Julia that have no hardware FMA, software will step in and calculate -- not nearly as quick but much more maintainable and upgrade-ready. Julia binaries will arrive with any local FMA support recognized and engaged .. it is part of what is happening in advance of v1. The algorithms that use FMA are much more amenable to rigorous characterization; a decade of academic papers agrees.

JeffreySarnoff commented 6 years ago

People seem to like the package name DoubleDouble. So, it makes sense to continue using that name. You would be added to that repository with write+ priveledges and serve as a steward for the project. I will remain an active advisor (always feel at ease with questions). @saschatimme, @simonbyrne, @dpsanders, @timholy and @JeffreySarnoff have different sorts of expertise with overlap on DoubleDoubles. At some point, you will find that TripleDoubles are needed behind some of the transcendental computations to obtain the reliability and stability people seek when using DoubleDoubles. I suggest implementing the DoubleDoubles first without TripleDoubles, and then seeing for yourself which parts of what functions need the extra precision. Doing it that way will lead to a cleaner implementation. And, to be really "on point" we should use Sollya once we have a working package with pretty code. I have a decent relationship with the person behind Herbie project, and their work is valuable too.

JeffreySarnoff commented 6 years ago

In addition to the need for a floating point type formed of two Float64 values and using FMA, there is a real use for a floating point type formed of two Float32 values and using FMA --- this is the best way to use most of the mass market GPUs for parallel maths and machine learning processes that benefit from an internal, computable representation larger than 32 bits. The algorithms are identical .. the chips that support FMA do so for 32-bit floats generally and for 64-bit floats on many CPUs and some GPUs. Reviewing the discussion about parameterization, this a compelling (and work saving) reason to parameterize over Float64 and Float32. Your inclination to provide parameterization that selects speed/precision/accuracy differently balanced kinds of a doubled float is reasonable and would give the users who want to choose along those joint axes an easy approach.

Each of the alternative balances is backed by distinguished algorithms aor precisions attended. DoubleDoubles that do not use TripleDoubles at all and DoubleDoubles that do are one example. @TimHoly will have some valuable insight on the +s & -s of doing that through a (say, second) parameter or using his trait coding for that refinement. Actually, there is a third way too, smart admixture of short abstract type specializations with judicious use of small Union types does this its own way. We can make small (2 bitwidths and for each (algorithms are unchanged) 2 implementations: faster vs more accurate) mock-ups with each approach without actually doing more than on arithmetic op. Probably one approach will rule by virtue of clarity, ease of getting it right, and staying out of the way of implementation.

@StefanKarpinski has a good approach to generating floating point values that sample IEEE-768 space.

StefanKarpinski commented 6 years ago

@StefanKarpinski has a good approach to generating floating point values that sample IEEE-768 space.

Just reinterpret random UInt64 values as Float64 – this gives you every possible value. If you want to only get finite values, then you need to limit the range of UInt64 values, e.g.:

function rand_float64()
    m = reinterpret(UInt64, realmax(Float64))
    x = reinterpret(Float64, rand(0:m))
    return ifelse(rand(Bool), -x, x)
end

There's probably a clever way to avoid the ifelse part at the end by sampling the whole desired range of UInt64 values in the first place.

JeffreySarnoff commented 6 years ago

If you have two working precisions, Float32x2 and Float64x2 as follows hardware support, and the significand of the larger (53 bits, one is implict) exceeds twice that of the smaller (24 bits, one is implicit) by 3-4 bits (53-48 = 5) then you have a third, more accurate than Float32x2 and less precise than Float64x2. That is had thus: accept input and present output as Float32x2, do all computation by converting the input to Float64x2 values and calculate using Float64x2; when a result is obtained, round into Float32x2 values and report (or store) those.

I know this to work. I implemented the undoubled version in Julia. The insight is William Kahan's.

The essential work on DoubleDouble should not be overburdened with multiple tracks of better and worse algorithms. It is hard to manage, difficult to document and unlikely to provide more solutions to userspace than the use of Float32s or of Float64s or their shared space does. Carrying an additional parameter is not difficult, and having one present without being specialized during the development of an initial release could be ok. The best strategy on this is follow the advice of @timholy on carrying another param or working with trait specialized dispatch for algorithmic groups.

saschatimme commented 6 years ago

Thank you for your awesome input @JeffreySarnoff !!!

Another thing to consider would be the support of rounding modes since we then could support IntervalsArithmetic.jl. But I am not sure how one would implement it. Do you know of any papers who consider double-double precision and rounding modes?

To recapitulate:

The first steps would then be: 1) Define the types and conversions 2) Support of basic arithmetic operations using only DoubleDoubles 3) Add first versions of transcendental functions (relying only on DoubleDoubles)

Additionally it would be good to setup a test enviroment where we can sample uniformly Float32s and Float64s to test the functions on all possible inputs, or at least one a uniform distributed fraction.

JeffreySarnoff commented 6 years ago

With the council of @dpsanders, we will get the rounding to be proper.

simonbyrne commented 6 years ago

@saschatimme The development location is up to you. You are more than welcome to take over DoubleDouble.jl, and I have given you commit access, do with it as you see fit. Feel free to wipe the slate clean (as long as you remember to change the version numbers appropriately). Of course, if you would prefer your own repository, I have no problem with that.

Unfortunately, I do not have time to contribute meaningfully for the foreseeable future, but would be happy to answer any questions you might have. Good luck!

dpsanders commented 6 years ago

@JeffreySarnoff has shown in ErrorfreeArithmetic.jl how correct rounding can be achieved for the basic arithmetic operations using double-word arithmetic. I guess that TripleDouble would be enough to obtain correct rounding for DoubleDoubles?

JeffreySarnoff commented 6 years ago

oh, yes

saschatimme commented 6 years ago

I created a development branch. I only defined so far the new double parameterized type and added some constructors. I am a little bit constrained with my time at the moment, but I try to find an hour here and there to add slowly things.

JeffreySarnoff commented 6 years ago

It is my opinion that this should target v0.7+ and not be concerned about v0.6 because by the time it is ready, v0.7 should be out and there are syntactic differences (deprecations) that are hard to straddle (and 0.7 is better that way). I am reviewing the dev src,

JeffreySarnoff commented 6 years ago

and of course we can use Compat

JeffreySarnoff commented 6 years ago

What is your opinion of this (v0.7)

module DoubleDouble

export Double

const SysFloat = Union{Float16, Float32, Float64}

# Algorithmic choice is a Trait
abstract type Trait end
abstract type Emphasis <: Trait end
struct Accuracy    <: Emphasis end
struct Performance <: Emphasis end

abstract type AbstractDouble{T} <: AbstractFloat end

struct Double{T<:SysFloat, E<:Emphasis} <: AbstractDouble{T}
    hi::T
    lo::T
end

Base.promote_rule(::Type{Double{T,Accuracy}}, ::Type{Double{T,Performance}}) where T<:SysFloat = Double{T,Accuracy}
Base.convert(::Type{Double{T,Accuracy}}, x::Double{T,Performance}) = Double(Accuracy, x.hi, x.lo)
Base.convert(::Type{Double{T,Performance}}, x::Double{T,Accuracy}) = Double(Performance, x.hi, x.lo)

function Double(::Type{E}, hi::T, lo::T) where {T<:SysFloat, E<:Emphasis}
    s = hi + lo
    e = (hi - s) + lo
    return Double{T,E}(s, e)
end

@inline Double(hi::T, lo::T) where T<:SysFloat = Double(Accuracy, hi, lo)
@inline Double(x::T) where T<:SysFloat = Double(x, zero(T))
@inline Double(x::T) where T<:Real = Double(Float64(x))
@inline Double(x::T1, y::T2) where {T1<:Real, T2<:Real} = Double(Float64(x), Float64(y))

"""
     quick_two_sum(a, b)

Computes `s = fl(a+b)` and `e = err(a+b)`. Assumes `|a| ≥ |b|`.
"""
@inline function quick_two_sum(a::Float64, b::Float64)
    s = a + b
    e = b - (s - a)
    return s, e
end

"""
     two_sum(a, b)

Computes `s = fl(a+b)` and `e = err(a+b)`.
"""
@inline function two_sum(a::T, b::T) where T<:SysFloat
    s = a + b
    v = s - a
    e = (a - (s - v)) + (b - v)
    return s, e
end

#=
    This is used for both Accuracy and Performance because addition and subtraction
    must be exact to preserve the numerical logic throughout.
=#

function Base.(+)(x::Double{T, E}, y::Double{T, E}) where {T<:SysFloat, E<:Emphasis}
    s1, s2 = two_sum(x.hi, y.hi)
    t1, t2 = two_sum(x.lo, y.lo)
    s2 += t1
    s1, s2 = quick_two_sum(s1, s2)
    s2 += t2
    s1, s2 = quick_two_sum(s1, s2)
    return Double(s1, s2)
end

Base.(+)(x::Double{T, E}, y::T) where {T<:SysFloat, E<:Emphasis} = (+)(x, Double(E, y))
Base.(+)(x::T, y::Double{T, E}) where {T<:SysFloat, E<:Emphasis} = (+)(Double(E, x), y)
Base.(+)(x::Double{T, E1}, y::Double{T, E2}) where {T<:SysFloat, E1<:Emphasis, E2<:Emphasis} =
    (+)(promote(x, y)...)

# etc
end # module
saschatimme commented 6 years ago

I agree with targeting v0.7+. I also like your naming with Emphasis, Accuracy, Performance better. For the implementations of the basic arithmetic operations I want to setup a proper testing and benchmark environment. E.g. I am not sure whether it is good to introduce a branch condition in the addition of two Doubles. But I don't want to rely on my gut feeling so some data to back up claims would be good :)

JeffreySarnoff commented 6 years ago

I had just copied from the DoubleDouble implementation, but you are right .. the corrections are made above. The two_xxx routines should be in their own file, as you had done.

JeffreySarnoff commented 6 years ago

@saschatimme please think on the testing and the functions we need to support that testing

saschatimme commented 6 years ago

I hope to get that done this weekend @JeffreySarnoff

JeffreySarnoff commented 6 years ago

Thank you. Please give me an example of use.

saschatimme commented 6 years ago

I actually wanted to write something up, but somehow it slipped through. Sorry about that!

FloatIterator is an iterator to iterate a uniform sample from the space of Float32 or Float64 numbers.

For example

FloatIterator(Float32, targetlength=100, positive=true, constraint=:machine_eps)

will generate 100 positive Float32 numbers uniformly distributed with a minimal size of eps(Float32). constraint can also be :normalized (only normalized numbers), :denormalized (all numbers including denormalized). The default is :machine_eps. positive indicates the sign (it would have been a little bit cumbersome to iterate over the positives and negatives at the same time).

Instead of targetlength you can also say that you want to use every n-th floating point number. This would be done with

n = 10_000
FloatIterator(Float32, n)

Everything works the same for Float64.

Note that the iterator generates all numbers on the fly, i.e. besides computing time it will not be a problem to test a huge amount of numbers.

JeffreySarnoff commented 6 years ago

Ok.
I have exported a function from src/float/eps_ufp_ulp.jl slp(x) which returns the integer value to use as the second argument to ldexp(significand, exponent) to generate a value that [just] does not overlap any bits of x and is of lesser magnitude than x. This helps to generate Doubles.

julia> using DoubleDouble # development branch

julia> using DoubleDouble # development branch

julia> r1, r2 = rand(), rand()
(0.15184596647331916, 0.830039203939064)

julia> r3 = ldexp(frexp(r2)[1], frexp(r1)[2]+slp(r1))
5.75955397221678e-18

julia> DoubleDouble.two_sum(r1,r3) # does not overlap
(0.15184596647331916, 5.75955397221678e-18)

Could you add a version of your float generator that generates doubles? The lo part should be randomly positive or negative, and the value used as +slp(r1) really should be slp(r1)-rand([0,0,0,0,0,0,0,0,0,1,1,1,2,2,3,4]) (or something more correct than that). So make the part that calls slp() a function where we can improve the rand() portion later (something like):

 n=4;([(i-1, Int(2^n * 0.5^i)) for i in 1:n])

shows 8 0s, 4 1s ... to be the rand target for subtracting from slp(r1)

JeffreySarnoff commented 6 years ago

The errorfree and errorbest building blocks seem ok (untested, tried some) [except three_prod, which is not ok, and I have it commented out]. The routines used by DD/QD are not well characterized with respect to error or stability. In any event, it is more of a task to get the Accuracy versions to be known to be working well than to introduce time-saving inaccuracies. Also, we need to pay attention to the interaction of less accurate routines -- avoiding cascading error thresholds. There are opportunities to introduce less accurate (but still good) versions of a few of the building blocks. Probably, it is best to wait on that as there are many ways to short-cut the trig(Double) and a few ways to short-cut arith(Double). Until we have a way to determine the general accuracy and performance of one set of functions, it is difficult to make optimal choices with respect to modified versions.

saschatimme commented 6 years ago

Sorry for my slow responses lately, I was quite busy the last couple of weeks but I hope that things get better soon.

There is no a DoubleIterator:

DoubleIterator(::Type{<:Union{Float32, Float64}}, ::Type{<:Emphasis}; nslp_rands=4, targetlength=10_000, positive=true, constraint=:machine_eps)

Create an iterator over all possible floating point numbers of the respective type.
* `targetlength` is the number of uniform samples.
* `positive` is whether positive or negative numbers should be sampled.
* `constraint` limits the number of possible floating points numbers. Possible values are
`:machine_eps` (no number smaller than machine epsilon), `:normalized` (only normalized numbers),
`:denormalized` (all numbers including denormalized)

The hi part is basically the corresponding FloatIterator and the low part is randomly generated for now.

JeffreySarnoff commented 6 years ago

thanks -- we understand that you have commitments; that you are able to contribute is great!

saschatimme commented 6 years ago

So I added some accuracy tests for +, -, *, / and a short benchmark script for these operations.

The performance for Float64 looks really nice

Double Float32: Accuracy
+
  4.342 ns (0 allocations: 0 bytes)
-
  4.348 ns (0 allocations: 0 bytes)
*
  3.195 ns (0 allocations: 0 bytes)
/
  28.091 ns (0 allocations: 0 bytes)
Double Float32: Performance
+
  4.347 ns (0 allocations: 0 bytes)
-
  4.345 ns (0 allocations: 0 bytes)
*
  3.104 ns (0 allocations: 0 bytes)
/
  7.857 ns (0 allocations: 0 bytes)
Double Float64: Accuracy
+
  7.673 ns (0 allocations: 0 bytes)
-
  7.673 ns (0 allocations: 0 bytes)
*
  6.878 ns (0 allocations: 0 bytes)
/
  35.259 ns (0 allocations: 0 bytes)
Double Float64: Performance
+
  7.673 ns (0 allocations: 0 bytes)
-
  7.673 ns (0 allocations: 0 bytes)
*
  6.875 ns (0 allocations: 0 bytes)
/
  11.330 ns (0 allocations: 0 bytes)
Float64 baseline
+
  1.382 ns (0 allocations: 0 bytes)
-
  1.346 ns (0 allocations: 0 bytes)
*
  1.382 ns (0 allocations: 0 bytes)
/
  1.651 ns (0 allocations: 0 bytes)

It seems that for Float32 the compiler does not produce good SIMD instructions, but I did not looked into in detail. I'm also not sure whether this is worthwhile to test the Float32 performance on a 64 bit CPU. I made a subtle mistake in my script which lead to the permutation of the test result.

JeffreySarnoff commented 6 years ago

I do not know the answer to the first question (Float32 and SIMD instructions) -- it bears investigation.

It is worthwhile to test Float32 performance on 64bit CPUs for the same reason that is is worthwhile to test the performance of sum_kbn on vectors of Int64s and on vectors of Int32s. We want to know where substantive advantage exists and where it does not appear. This information helps us to guide users of the facility.

The numbers for Accurate and for Performant Double{Float64}s are encouraging. It is good to have a characterization of the relative time that Accuracy spends and Performance saves. The best next step would be to characterize the relative precision that Performance drops and Accuracy preserves.

saschatimme commented 6 years ago

I had a subtle mistake in the benchmark script. The Float64 tests were actually the Float32 tests and the other way around. I updated the table with the correct values.

JeffreySarnoff commented 6 years ago

there is much to be said for being able to spot one's own mixups When calculating the effective precision of a function of Doubles, given one of many sets of input, we have the actual inputs and know those values to be completely accurate and entirely precise [for the purpose at hand] and we have the obtained result, another Double value (often). Where the (e.g.) two input Double values are treated as correct (each is exact and complete in the value it stores), the Double result would be within 1ulp of the theoretical exact result -- unless the calculation itself were imperfect.

Unsurprisingly, some calculations are imperfect. We want to (transportably) quantify the extent to which some result_Double = mathfunction(Double, Double) falls away from the theoretically true result, rounded to the precision-in-significand-bits of the the result Double. Performing the calculation using BigFloat conversion of the input Doubles and then converting the BigFloat result to a Double does get one something with which the Double-only result could be compared.

HOWEVER tread with care, it is quite easy to be comparing something other than what you intend. I will provide an example -- long story short BigFloat(hi(Double)) + BigFloat(lo(Double)) can be not ok. BigFloat(string(hi(Double))) + BigFloat(string(lo(Double))) will better encode the values [when the values to encode are taken as exact, otherwise a third approach is needed].

saschatimme commented 6 years ago

I had some time this evening and read the relevant parts of

[Popescu 2017] Valentina Popescu. Towards fast and certified multiple-precision librairies. Computer Arithmetic. Université de Lyon, 2017.

I think that is a really solid work. From my understanding you already implemented most of the double-word precision algorithm from them. Given they have proper proves for nearly everything (complemented with numerical experiments) I think we do not need an excessive amount of texts (especially given at least my limited time budget).

I think we have now the standard double-word arithmetic properly covered. It would be good to clean things a little bit up (under 0.6. there are quite a lot of duplicate method definition warnings, don't know why 0.7 misses them). Furthermore, we so far did not use any 0.7 special things (besides Base.isone which can be easily handled). Thus I think we could actually also support 0.6 atm.

Personally, I need double-, triple- and quad-word arithmetic for my package HomotopyContinuation.jl (BigFloat is just waaaay too slow). I could get something working just with +, -, *, / and √. Therefore, would you be on board to focus onto getting a first version with double-, triple- and quad-word arithmetic done where we just have these basic methods plus the necessary conversions, promotions etc.?

Afterwards, we can work on the transcendental functions one by one and also improve the basic arithmetic operations where possible.

JeffreySarnoff commented 6 years ago

I am traveling. Try my ArbFloats.jl for much faster than BigFloat.

JeffreySarnoff commented 6 years ago

ArbFloats -- there are separate tags for ver 0.6 and ver 0.7. I have no problem with approaching this in the sequence you outline. Let me know if ArbFloats fills your need, or if it does not.

I will clean the method overwrites.

JeffreySarnoff commented 6 years ago

@simonbyrne We will need JuliaMath/QuadDoubles.jl, exporting Quad or Quadruple (whichever you prefer). If you create it and give write access to @saschatimme and I (and any other person inclined to help), I would create a "development" branch immediately and do the dev work there. When it is clean and tests pass, we would merge it into the blank "main". If there is a better way, ok.

JeffreySarnoff commented 6 years ago

I thought about this some more -- I have no desire to put stuff in JuliaMath unless it is clean and ready -- I could open QuadDoubles and give @saschatimme collab status, then .. eventually .. let y'all work it over before migrating it here. Pbly best.

GregPlowman commented 6 years ago

QuadFloats ??

JeffreySarnoff commented 6 years ago

Hi Greg, I like the sound of that as the package name. The exported "doubled double-double" type could be QuadFloat, though I have taken a shine to the one-word approach (DoubleDouble exports Double -- it makes writing the software a bit simpler, as more is readable on a line.

[the more Julian naming would be DoubleDoubles.jl that exports Double (or DoubleDouble)] QuadFloats exporting QuadFloat is decent and meritorious.

Thank you for joining in.

simonbyrne commented 6 years ago

I'm not set on the name DoubleDouble: I just picked it because it was already in wide use, and I like the Shakespearean reference.

Happy to create a new repo under JuliaMath: let me know what you decide.

saschatimme commented 6 years ago

In my opinion quad- and double-word precision should be in the same package since they share the same basic routines (although we obviously can split things in 3 packages, but I think that this is too much). I would propose that we develop the quad-word arithmetic in the current development branch and decide at the end whether we split things, the exact naming etc.

JeffreySarnoff commented 6 years ago

Julia strongly prefers non-monolithic package design. It is more reliable to design for interdependent modules by decoupling their specificities. Waiting for some other time to split things is inviting trouble and pre-expensing time. Good splitting invites implementing understanding rather than coding stuff.

When the development branch of DoubleDouble offers more accuracy than the classic DoubleDouble implementation and also offers similar or less accuracy than classic, depending on the user's choice of Emphasis. Giving it a non "DoubleDouble" name makes sense.

current thinking (the Triple arithmetic and renormalization is a support module, not a user pkg)

package name more performant type more accurate type private{T,E}
DoubleFloats.jl FastDouble DoubleFloat Double
QuadFloats.jl FastQuad QuadFloat Quad
JeffreySarnoff commented 6 years ago

also the errorfree transformations and the nearly errorfree calculations need to be split out and usingd by DoubleFloats and others because they are building blocks for other algorithmic directions and if not made available via packaging, others would be forced to cut and paste or rewrite.

JeffreySarnoff commented 6 years ago

@simonbyrne please create the repo "DoubleFloats.jl" and let me write to it

simonbyrne commented 6 years ago

@JeffreySarnoff and @saschatimme: I've invited you both to be members of the JuliaMath org, so you should be able to do it yourself now.

JeffreySarnoff commented 6 years ago

@saschatimme Please look at https://github.com/JuliaMath/DoubleFloats.jl/blob/master/src/math/trig.jl. It is a copy/paste/rename of your trig functions from HigherPrecision. The renaming is to avoid reuse of names elsewhere defined,

Something is not quite right with sin, cos (they are in the general ballpark but not nearly as crisp as the original). I have not isolated the issue. It would be a big help were you to see what is not exactly as it needs be.

You can clone the master to run in a v0.7-DEV Repl.

Other than that, its moving forward.