Mismatch of output with and without gradient

ericagol commented 2 years ago

There is something different going on between computations with and without gradients. Here is a minimum working example (run in Julia 1.6):

using NbodyGradient, DelimitedFiles
t0 = 7257.93115525; h = 0.05; nplanet = 8;
ic = ElementsIC(t0, 9, "elements_8planet_periodic_run13.txt");
intr = Integrator(h, 0.0, 2000.0);
s = State(ic);
tt1 = TransitTiming(intr.tmax,ic);
intr(s,tt1,grad=false);
s = State(ic);
tt2 =  TransitTiming(intr.tmax,ic);
d = NbodyGradient.Derivatives(eltype(s.x),s.n);
intr(s,tt2,d,grad=true);
maximum(abs.(tt1.tt .- tt2.tt))
5.4569682106375694e-12

The transit time should be identical in these two cases, so this indicates that somehow the code has diverged. Also, the output positions differ as well.

ericagol commented 2 years ago

The difference should be zero, and we should write a test for this.

ericagol commented 2 years ago

I'm having trouble finding substantive differences between the two algorithms. This may be one of them: the ordering of the computation of "fac" in phisalpha may matter:

https://github.com/ericagol/NbodyGradient.jl/blob/dcdb31e30d31785d5ee5761edb9d78284bb7ee2a/src/integrator/ahl21/ahl21.jl#L575-L577

vs.

https://github.com/ericagol/NbodyGradient.jl/blob/dcdb31e30d31785d5ee5761edb9d78284bb7ee2a/src/integrator/ahl21/ahl21_no_grad.jl#L125

ericagol commented 2 years ago

Here's another two spots with a difference in phisalpha:

https://github.com/ericagol/NbodyGradient.jl/blob/dcdb31e30d31785d5ee5761edb9d78284bb7ee2a/src/integrator/ahl21/ahl21.jl#L622

vs.

https://github.com/ericagol/NbodyGradient.jl/blob/dcdb31e30d31785d5ee5761edb9d78284bb7ee2a/src/integrator/ahl21/ahl21_no_grad.jl#L141

and

https://github.com/ericagol/NbodyGradient.jl/blob/dcdb31e30d31785d5ee5761edb9d78284bb7ee2a/src/integrator/ahl21/ahl21.jl#L624

vs.

https://github.com/ericagol/NbodyGradient.jl/blob/dcdb31e30d31785d5ee5761edb9d78284bb7ee2a/src/integrator/ahl21/ahl21_no_grad.jl#L143

(I just edited it - the last link was a repeat of the prior one by mistake.)

langfzac commented 2 years ago

Hmm, this is weird -- looking into it now.

ericagol commented 2 years ago

Given that the source code dot_fast does exactly the same computation, I suspect the fault might lie with the order of operations for fac/fac2.

langfzac commented 2 years ago

This is strange... I ran each of the sub-integration-step functions (ie. kickfast, drift), with and without derivatives, for a large number of steps and I don't see any difference between the final positions.

ericagol commented 2 years ago

This is strange... I ran each of the sub-integration-step functions (ie. kickfast, drift), with and without derivatives, for a large number of steps and I don't see any difference between the final positions.

@langfzac Did you check phisalpha? This is the routine where I found a difference in how the computation is done.

langfzac commented 2 years ago

Yes, and it didn't show any differences...

I just wrote a script to call each sub-function in order and print the maxabs difference in position after each, if there are any. The first difference comes up after 43 iteration upon calling the last drift! in the integration step.

using NbodyGradient
import NbodyGradient: kickfast!, drift!, drift_kepler!, kepler_drift!
import NbodyGradient: phic!, phisalpha!, kepler_driftij_gamma!, drift_grad!

t0 = 7257.93115525; h = 0.05; nplanet = 8;
ic = ElementsIC(t0, 8, "elements.txt")
intr = Integrator(t0, 2000.0)

s_nograd = deepcopy(State(ic));
d = NbodyGradient.Derivatives(Float64, s_nograd.n);
s_grad = deepcopy(State(ic));

h2 = 0.5*h;
h6 = h/6;
for k in 1:100
    kickfast!(s_nograd, h6)
    kickfast!(s_grad, d, h6)
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("kickfast_first iter ",k, " max:", maxabs); end

    drift!(s_nograd, h2)
    drift_grad!(s_grad, h2)
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("drift_first iter ", k, " max:", maxabs); end

    # Only computing the integration for each -- no derivatives
    n = s_nograd.n
    for i in 1:n-1, j in i+1:n
        if ~s_nograd.pair[i,j]
            kepler_driftij_gamma!(s_nograd, i, j, h2, true)
            kepler_driftij_gamma!(s_grad, d, i, j, h2, true)
        end
    end
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("drift+kepler iter ", k, " max:", maxabs); end

    phic!(s_nograd, h)
    phic!(s_grad, d, h)
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("phic iter ", k, " max:", maxabs); end

    phisalpha!(s_nograd, h, 2.0)
    phisalpha!(s_grad, d, h, 2.0)
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("phisalpha iter ", k, " max:", maxabs); end

    # Only computing the integration for each -- no derivatives
    n = s_nograd.n
    for i in n-1:-1:1, j in n:-1:i+1
        if ~s_nograd.pair[i,j]
            kepler_driftij_gamma!(s_nograd, i, j, h2, true)
            kepler_driftij_gamma!(s_grad, d, i, j, h2, true)
        end
    end
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("kepler+drift iter ", k, " max:", maxabs); end

    drift!(s_nograd, h2)
    drift_grad!(s_grad, h2)
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("drift_last iter ", k, " max:", maxabs); end

    kickfast!(s_nograd, h6)
    kickfast!(s_grad, d, h6)
    maxabs = maximum(abs.(s_grad.x .- s_nograd.x))
    if maxabs > 0; println("kickfast_last iter ", k, " max:", maxabs); end
end

(edit: changed script -- there was a bug in the kepler_driftij_gamma! calls)

ericagol commented 2 years ago

@langfzac Could you also add a comparison with the compensated summation error terms?

langfzac commented 2 years ago

Yep, that's exactly what I'm working on!

langfzac commented 2 years ago

They do in fact differ after 40 iterations, on the last drift! call. Maybe there are some extra compensated summation calls in the derivative versions?

langfzac commented 2 years ago

Ok, so it does appear that the computation of fac in phic! differs for the gradient vs non-gradient versions by ~1e-16. So, I'd assume those in phisalpha! may as well.

ericagol commented 2 years ago

They do in fact differ after 40 iterations, on the last drift! call. Maybe there are some extra compensated summation calls in the derivative versions?

Interesting. I was expecting the error terms to differ first since these are essentially storing higher precision information. I didn't find any extra compensated summations, but possibly missed one. I still think that making the phisalpha routines agree first would be the most probable source of the difference.

langfzac commented 2 years ago

I agree -- working on that, now.

langfzac commented 2 years ago

Ok, this has been fixed in #79.

ericagol commented 2 years ago

Ok, this has been fixed in #79.

Oh, so did that work?!

langfzac commented 2 years ago

Yep! It looks like it did come down to numerical differences in the fac computations. Check out the PR (#79).

ericagol / NbodyGradient.jl

Mismatch of output with and without gradient #78