JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.68k stars 5.48k forks source link

complex polygamma #7033

Closed StefanKarpinski closed 10 years ago

StefanKarpinski commented 10 years ago

We have complex gamma and real polygamma but not complex polygamma – or digamma or trigamma. If anyone wants to tackle this, we really ought to have these. While we're at it, it might be good to readjust the definitions so that if you provide polygamma for a type, you get digamma and trigamma "for free". Currently you have to explicitly add methods for those too for any new type.

andreasnoack commented 10 years ago

There is a digamma for complex argument in Naval Surface Warfare library which a Julia implementation could be based on, but no trigamma or general polygamma.

stevengj commented 10 years ago

Some references on computing the polygamma function for complex arguments can be found in Lozier and Olver, although when I look at the cited references it looks like most of them are only for the gamma or digamma function of complex arguments. I'm not seeing any recent publications on this, but these continued-fraction expansions may be helpful.

stevengj commented 10 years ago

There is also the CPSIPG program. And this technical report, if you can find it (or order it through your library).

stevengj commented 10 years ago

K. S. Kölbig, "Programs for computing the logarithm of the gamma function, and the digamma function, for complex argument," Computer Phys. Commun. vol. 4, pp. 221–226 (1972), uses the following formula to compute the digamma function (based on the Stirling series of ln Γ): digamma where the Bₙ are the Bernoulli numbers, the Rk term is a remainder that is dropped, and z=x+iy. This also seems to be essentially the algorithm employed by SLATEC's CPSI routine.

It seems like it would be straightforward to write down a recurrence for the n-th derivative of this expression in order to obtain the general polygamma function, especially since it looks like it will converge faster for higher derivatives (i.e. you will need smaller and smaller K).

stevengj commented 10 years ago

The following code computes the digamma function accurately (as far as I can tell from comparison with Wolfram Alpha) to double precision in the complex plane:

import Base: digamma, Math.@horner
function digamma(z::Complex{Float64})
    # Based on eq. (12), without looking at the accompanying source code, of:
    #     K. S. Kölbig, "Programs for computing the logarithm of the gamma function,
    #     and the digamma function, for complex argument," Computer Phys. Commun.
    #     vol. 4, pp. 221–226 (1972).
    x, y = reim(z)
    if x < 0 # reflection formula
        ψ = -π * cot(π*z)
        x = 1 - x; y = -y
        z = Complex(x, y)
    else
        ψ = zero(z)
    end
    if x < 7 # shift using recurrence formula
        n = 7 - ifloor(x)
        for ν = 0:n-1
            ψ -= inv(z + ν)
        end
        z = Complex(x + n, y)
    end
    t = inv(z)
    ψ += log(z) - 0.5*t
    t *= t # 1/z^2
    ψ -= t * @horner(t,0.08333333333333333,-0.008333333333333333,0.003968253968253968,-0.004166666666666667,0.007575757575757576,-0.02109279609279609,0.08333333333333334) 
end
digamma{T <: Integer}(z::Complex{T}) = digamma(float(z))

Note that I used a slightly different version of the reflection formula for real(z)<0 than the one in Kölbig: it uses the recurrence formula to save a 1/z term.

The @horner coefficients are computed from:

const A000367 = map(BigInt, split("1,1,-1,1,-1,5,-691,7,-3617,43867,-174611,854513,-236364091,8553103,-23749461029,8615841276005,-7709321041217,2577687858367,-26315271553053477373,2929993913841559,-261082718496449122051", ","))
const A002445 = [1,6,30,42,30,66,2730,6,510,798,330,138,2730,6,870,14322,510,6,1919190,6,13530]
const bernoulli = map((a,b) -> a//b, A000367, A002445) # even-index Bernoulli numbers
const bernoulli64 = float64(bernoulli)
println(bernoulli64[2:8] ./ (2*(1:7)))
StefanKarpinski commented 10 years ago

Gotta love that Horner macro. If we're not exporting that thing, we should.

stevengj commented 10 years ago

As described in Knuth TAOCP vol. 2, sec. 4.6.4, there is actually an algorithm even better than Horner's rule for evaluating polynomials p(z) at complex arguments (but with real coefficients): you can save almost a factor of two for high degrees. It is so complicated that it is basically only usable via code generation, so it would be especially nice to modify the @horner macro to switch to this for complex arguments.

stevengj commented 10 years ago

(I should have a general polygamma function implemented soon if nothing unexpected happens.)

StefanKarpinski commented 10 years ago

Is that the right link to the Horner alternative?

stevengj commented 10 years ago

Yes. There is a description of the algorithm near the end of the thread. I couldn't find a better description online.

stevengj commented 10 years ago

Here's a @horner-like macro that implements Knuth's algorithm. As you can see (try changing macro to function to see the resulting expression), this algorithm is extremely annoying to use efficiently (i.e. inlined) without code generation:

# evaluate p[1] + z * (p[2] + z * (....)) for complex z and real p. 
# Instead of Horner's rule, we use the algorithm in Knuth TAOCP
# vol. 2, sec. 4.6.4, which has lower arithmetic costs.  Assumes
# degree(p) > 0 and that p consists of real numeric literals.
macro chorner(z, p...)
    a = p[end]
    b = p[end-1]
    as = {}
    for i = length(p)-2:-1:1
        ai = symbol(string("a", i))
        push!(as, :($ai = $a))
        a = :($b + r*$ai)
        b = :($(p[i]) - s * $ai)
    end
    ai = :a0
    push!(as, :($ai = $a))
    Expr(:block,
         :(x = real($(esc(z)))),
         :(y = imag($(esc(z)))),
         :(r = x + x),
         :(s = x*x + y*y),
         as...,
         :(Complex($ai * x + $b, $ai * y)))
end
StefanKarpinski commented 10 years ago

This is probably less efficient for non-complex arguments though. Or does the fact that the imaginary part is zero allow code generation to eliminate any extra code?

stevengj commented 10 years ago

My digamma function above is almost 4x faster with the @chorner macro compared to @horner. I had no idea the speed difference was that great! (Because, without code generation, I've never bothered using Knuth's algorithm before.)

Although a part of the speed difference could be due to the fact that @chorner avoids Complex arithmetic entirely, and probably more stuff can be kept in registers just because of type issues.

I don't think you'd want to use it for real arguments, though; as written, it always produces a Complex result, nor do I think the compiler could be smart enough to eliminate the extra work it involves (naïvely, it looks like about double the number operations for real data compared to vanilla horner).

stevengj commented 10 years ago

Hmm, it would be nice to be able to support polygamma(n, z) for arbitrary complex orders n. I think I can do it, except for real(z) < 0: I'm not sure how to compute fractional derivatives of cot (without using the polygamma function).

StefanKarpinski commented 10 years ago

That's a phenomenal speed up. Good code generation is pretty amazing. I wonder if there's some way to make the Horner macro work for complex or real arguments without the user needing to know. Maybe generate both versions with a conditional on the type? Type inference should typically eliminate the branch.

stevengj commented 10 years ago

Type-inference apparently does not currently eliminate the branch, unfortunately; see #7060.

stevengj commented 10 years ago

Hmm, looks like Julia's polygamma(m, x) throws a domain error for x < 0, which is a little bogus. I guess that way you avoid the annoyance of computing the m-th derivative of a cotangent, but it doesn't seem like correct behavior.

andreasnoack commented 10 years ago

It is based on the SLATEC function dpsifn written by Amos. I browsed through the paper and he doesn't discuss negative arguments.

stevengj commented 10 years ago

Even for polynomial evaluation for real arguments, Knuth sec. 4.6.4 describes ways to save a few operations (or in some cases just to trade off multiplies for adds) compared to Horner for polynomials of degree 4 or higher. If we suppose that multiplication and addition have approximately the same cost (which is not a bad assumption these days), you can save 1 flop for evaluating polynomials of degree 5 or 6. Asymptotically, you can save 1/4 of the flops, but I'm not sure we care about @horner for degrees much more than 6. Again, it involves fairly complicated preprocessing of the coefficients and would be a pain without code generation, but may be worthwhile to implement in @horner.

stevengj commented 10 years ago

As a warmup for a general polygamma function, here is a trigamma function that handles complex (and negative) arguments, and is 10x faster for real arguments than our current trigamma(x). It uses a modified version of @chorner that switches to @horner for real arguments.

import Base: Math.@horner

macro chorner(z, p...)
    a = p[end]
    b = p[end-1]
    as = {}
    for i = length(p)-2:-1:1
        ai = symbol(string("a", i))
        push!(as, :($ai = $a))
        a = :($b + r*$ai)
        b = :($(p[i]) - s * $ai)
    end
    ai = :a0
    push!(as, :($ai = $a))
    C = Expr(:block,
         :(x = real($(esc(z)))),
         :(y = imag($(esc(z)))),
         :(r = x + x),
         :(s = x*x + y*y),
         as...,
         :(Complex($ai * x + $b, $ai * y)))
    R = Expr(:macrocall, symbol("@horner"), z, p...)
    :(isa($(esc(z)), Real) ? $R : $C)
end

function trigamma2(z::Union(Float64,Complex{Float64}))
    # via the derivative of the Kölbig digamma formulation
    x = real(z)
    if x < 0 # reflection formula
        return 9.869604401089358 * csc(π*z)^2 - # π² csc²(πz)
                trigamma2(1 - z)
    end
    ψ = zero(z)
    if x < 7 # shift using recurrence formula
        n = 7 - ifloor(x)
        ψ += inv(z)^2
        for ν = 1:n-1
            ψ += inv(z + ν)^2
        end
        z += n
    end
    t = inv(z)
    w = t * t # 1/z^2
    ψ += t + 0.5*w
    ψ += t*w * @chorner(w,0.16666666666666666,-0.03333333333333333,0.023809523809523808,-0.03333333333333333,0.07575757575757576,-0.2531135531135531,1.1666666666666667,-7.092156862745098) 
end
trigamma2{T <: Integer}(z::Union(T,Complex{T})) = trigamma2(float(z))
@vectorize_1arg Number trigamma2
stevengj commented 10 years ago

Also, it looks like our current polygamma(m,x) function may be extremely inaccurate for larger m. e.g. polygamma(20,7.0) returns -7.84694453213623e-37, but Wolfram Alpha says the answer is about -4.64461602724054326256119881499858715254724853920972329364143....

The SciPy polygamma function returns -4.644616027240542, which is correct to machine precision.

stevengj commented 10 years ago

Okay, I have a working polygamma replacement for both real and complex arguments that seems to be about 3x faster than our current polygamma for real arguments and appears to be much more accurate (e.g. it gives accurate results for order 20).

Still working on support for arguments with negative real parts, and for non-integer order.

stevengj commented 10 years ago

Hmm, it's somewhat disturbing that Mathematica and Maple don't agree on the value of polygamma(3.3, 12.1 + 1im). And my answer disagrees with both of them, so I'm not sure what the correct one is.

andreasnoack commented 10 years ago

Oops. I can see that I have divided with the scaling instead of multiplying in the code for polygamma. The scale is one for di and tri and hence it hasn't been noticed. See line 311. If you write

julia> (Base.Math.psifn(7.0,20,1,1)*gamma(21))[1]
4.644616027240543

i.e. the same as SciPy.

stevengj commented 10 years ago

Thanks @andreasnoack, that's good to know! I'll patch it now, although hopefully shortly I'll have a complete replacement.

stevengj commented 10 years ago

Oddly, both both Maple and Mathematica are giving real (and different!) results for polygamma(3.3, 12.1), even though the definition in terms of the Hurwitz zeta function in the literature seems to imply a complex result (which agrees with my result, hooray!). It is looking almost like both Mathematica and Maple are wrong (in different ways!) for non-integer order. Weird.

@alanedelman points out that there are, unfortunately, multiple extensions of the polygamma function to arbitrary complex orders m, and it seems that Mathematica, at least, is using a different choice.

Probably the best thing to do for now is to continue to restrict polygamma to nonnegative integer orders m. Absent an application, it's hard to know what extension we should use, and I'm not sure how to compute Mathematica's extension efficiently.

rfateman commented 10 years ago

Regarding evaluation of polynomials efficiently (more efficiently than Horner's rule)... The transformed expression will in general evaluate to different values. Maybe it doesn't matter.

Compile-time reorganization of code for all kinds of things is possible -- not only for Horner, but e.g. FFT where there is some known collection of zero elements. Google for accurate polynomial evaluation for some references. Basically if you want to evaluate accurately near x=x0, rearrange as a taylor series in (x-x0), You can also speed things up by splitting into parallel computations, e.g. even and odd exponents, gives you 2 threads.

RJF

StefanKarpinski commented 10 years ago

This is a good point – one that I'm sure @stevengj is quite aware of. It's worth considering both accuracy and speed, but my impression is that these optimizations are not bad for accuracy – fewer operations tends to lead to better accuracy since there are fewer opportunities for cancelation and round off errors. The specifics matter quite a lot, of course.

stevengj commented 10 years ago

The Knuth scheme for complex polynomial evaluation is based on the Goertzel algorithm, which has O(n^2) mean error growth. This is indeed worse than Horner, which is O(n). But these asymptotic considerations aren't very relevant for n < 10. The error growth also depends on the point z being evaluated---I think the worst case is for z on the unit circle with comparable coefficients, where the abovementioned growth rates are achieved. Things should be much better in applications like the one here, where the terms have rapidly decreasing weight

For large n >> 10 (evaluating polynomials of high degree), this difference is important, but then there are algorithms with even better error growth at some cost in arithmetic. The most famous case of high degree polynomial evaluation is an FFT, where the typical algorithm has O(sqrt log n) error growth