Very long compilation time for solve()

cpf-work commented 1 year ago

I've upgraded to v5.1 today and found very long compilation times when trying the simple pendulum example.

My base environment: [6e4b80f9] BenchmarkTools v1.3.2 [7073ff75] IJulia v1.24.2 [295af30f] Revise v3.5.6 [0f1e0344] WebIO v0.8.21 [de0858da] Printf

I created a minimal environment for testing: [764a87c0] BoundaryValueDiffEq v5.1.0 [91a5bcdd] Plots v1.39.0

I'm using Julia 1.9.3.

I run "Example 1: Simple pendulum" from the docs and find

@time sol1 = solve(bvp1, MIRK4(), dt = 0.05) 225.577779 seconds (289.71 M allocations: 60.160 GiB, 6.70% gc time, 99.98% compilation time)

Second execution is fast: @time sol1 = solve(bvp1, MIRK4(), dt = 0.05) 0.001343 seconds (6.28 k allocations: 879.165 KiB)

I'm quite confident this did not happend when I played around 1-2 weeks ago with an earlier version. I tried reverting to v5.0.0, but I'm not experienced in doing stuff like this and obtained an error message upon solve(), so I can't tell whether this also happens for me with v5.0.0 now.

I managed to revert to v4.0.1 ⌃ [764a87c0] BoundaryValueDiffEq v4.0.1

And I get more reasonable compilation times: @time sol1 = solve(bvp1, MIRK4(), dt = 0.05) 28.803127 seconds (42.95 M allocations: 6.039 GiB, 6.34% gc time, 99.76% compilation time)

mateuszbaran commented 1 year ago

:+1:

It's even worse in Manifolds.jl (https://github.com/JuliaManifolds/Manifolds.jl/pull/657):

13210.001646 seconds (24.48 G allocations: 7.394 TiB, 9.48% gc time)

vs BoundaryValueDiffEq v4.0.1:

142.380662 seconds (234.62 M allocations: 17.268 GiB, 4.31% gc time, 82.72% compilation time: 1% of which was recompilation)

ChrisRackauckas commented 1 year ago

Does it go away with Julia v1.10?

cpf-work commented 1 year ago

Yes. With Julia 1.10.0-beta3 and base environment [6e4b80f9] BenchmarkTools v1.3.2 [764a87c0] BoundaryValueDiffEq v5.1.0

@time sol1 = solve(bvp1, MIRK4(), dt = 0.05) 11.107603 seconds (12.80 M allocations: 860.423 MiB, 3.11% gc time, 99.92% compilation time)

mateuszbaran commented 1 year ago

Yes, kind of -- after just 22s I get that failure: #118 .

mateuszbaran commented 1 year ago

I've rerun CI and now it doesn't even finish within 6 hours.

ChrisRackauckas commented 1 year ago

Check if https://github.com/SciML/NonlinearSolve.jl/pull/261 made a difference.

If not, can you share an invalidation report? Someone make a sample code just like https://sciml.ai/news/2022/09/21/compile_time/#profiling_and_fixing_sources_of_invalidations and share the plot and the major invalidators. That would highlight what needs to be fixed.

mateuszbaran commented 1 year ago

I've made a (somewhat) minimal example:

using Manifolds
using BoundaryValueDiffEq
M = Manifolds.EmbeddedTorus(3, 2)
A = Manifolds.DefaultTorusAtlas()
p0x = [0.5, -1.2]
a2 = [-0.5, 0.3]
sol_log = Manifolds.solve_chart_log_bvp(M, p0x, a2, A, (0, 0))

It didn't finish computing within 30 minutes. It feels more like Julia compiler can't handle the code rather than just invalidations. Anyway, I will try leaving it running for longer.

cpf-work commented 1 year ago

Does the following help you? I used exactly the code from @ChrisRackauckas link and put the simple pendulum example in it. I don't really know what it's doing, but hopefully that's what was asked for :)

[764a87c0] BoundaryValueDiffEq v5.2.0 [91a5bcdd] Plots v1.39.0 [aa65fe97] SnoopCompile v2.10.8

using SnoopCompile

invalidations = @snoopr begin
    using BoundaryValueDiffEq

    const g = 9.81
    const L = 1.0
    tspan = (0.0, pi / 2)
    function simplependulum!(du, u, p, t)
        θ = u[1]
        dθ = u[2]
        du[1] = dθ
        du[2] = -(g / L) * sin(θ)
    end

    function bc1!(residual, u, p, t)
        residual[1] = u[end ÷ 2][1] + pi / 2
        residual[2] = u[end][1] - pi / 2
    end

    bvp1 = BVProblem(simplependulum!, bc1!, [pi / 2, pi / 2], tspan)
    sol1 = solve(bvp1, MIRK4(), dt = 0.05)
end;

trees = SnoopCompile.invalidation_trees(invalidations);

@show length(SnoopCompile.uinvalidated(invalidations)) # show total invalidations

show(trees[end]) # show the most invalidated method

# Count number of children (number of invalidations per invalidated method)
n_invalidations = map(trees) do methinvs
    SnoopCompile.countchildren(methinvs)
end

gives the following output

length(SnoopCompile.uinvalidated(invalidations)) = 1386
inserting unwrapcontext(io::IJulia.IJuliaStdio) @ IJulia C:\Users\m129319\.julia\packages\IJulia\Vo51o\src\stdio.jl:23 invalidated:
   backedges: 1: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (190 children)
              2: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (62 children)
              3: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (1 children)
              4: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (1 children)
              5: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (6 children)
              6: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (2 children)
              7: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (7 children)
              8: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (3 children)
              9: superseding unwrapcontext(io::IO) @ Base show.jl:298 with MethodInstance for Base.unwrapcontext(::IO) (1 children)
113-element Vector{Int64}:
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   1
   1
   ⋮
  85
  85
  85
  85
  85
  85
  85
  85
  85
  85
 128
 273

and the plot:

import Plots
Plots.plot(
    1:length(trees),
    n_invalidations;
    markershape=:circle,
    xlabel="i-th method invalidation",
    label="Number of children per method invalidations"
)

mateuszbaran commented 1 year ago

Here is a reproducer without external dependencies:

using SnoopCompile

function affine_connection(a, Xc, Yc)
    MR = 3.0
    Mr = 2.0
    Zc = similar(Xc)
    θ = a[1]
    sinθ, cosθ = sincos(θ)
    Γ¹₂₂ = (MR + Mr * cosθ) * sinθ / Mr
    Γ²₁₂ = -Mr * sinθ / (MR + Mr * cosθ)

    Zc[1] = Xc[2] * Γ¹₂₂ * Yc[2]
    Zc[2] = Γ²₁₂ * (Xc[1] * Yc[2] + Xc[2] * Yc[1])
    return Zc
end

function chart_log_problem!(du, u, params, t)
    mid = div(length(u), 2)
    a = u[1:mid]
    dx = u[(mid + 1):end]
    ddx = -affine_connection(a, dx, dx)
    du[1:mid] .= dx
    du[(mid + 1):end] .= ddx
    return du
end

using BoundaryValueDiffEq
a1 = [0.5, -1.2]
a2 = [-0.5, 0.3]

dt = 0.05
solver = MIRK4()

tspan = (0.0, 1.0)
function bc1!(residual, u, p, t)
    mid = div(length(u[1]), 2)
    residual[1:mid] = u[1][1:mid] - a1
    return residual[(mid + 1):end] = u[end][1:mid] - a2
end
bvp1 = BVProblem(
    chart_log_problem!,
    bc1!,
    (p, t) -> vcat(t * a1 + (1 - t) * a2, zero(a1)),
    tspan,
)
sol1 = solve(bvp1, solver, dt=dt)

It works fine with BoundaryValueDiffEq v4. I'm running invalidation checking locally and it's still running after about 2 hours.

avik-pal commented 1 year ago

@mateuszbaran Are this with the latest release of NonlinearSolve or the master?

mateuszbaran commented 1 year ago

This is with master branch of NonlinearSolve.

avik-pal commented 1 year ago

I can reproduce it as well. @ChrisRackauckas does the heavy use of closures here affect the timings? I can lift them out if needed

avik-pal commented 1 year ago

bvp1 = BVProblem(
    chart_log_problem!,
    bc1!,
    (p, t) -> vcat(t * a1 + (1 - t) * a2, zero(a1)),
    tspan,
)

u0 = (p, t) -> vcat(t * a1 + (1 - t) * a2, zero(a1)) is not supported

mateuszbaran commented 1 year ago

What should I use instead then?

avik-pal commented 1 year ago

pass in a vector of arrays with a uniform discretization

mateuszbaran commented 1 year ago

Thanks, that works. From my POV the issue is resolved then :+1: .

mateuszbaran commented 1 year ago

I'd just suggest updating the docs which imply that this kind of u0 is supported: https://docs.sciml.ai/DiffEqDocs/stable/types/bvp_types/#SciMLBase.BVProblem (look for initialGuess(t))

ChrisRackauckas commented 1 year ago

It's supposed to be supported. The docs are the correct one here. Did we update it to include parameters with the breaking? @avik-pal can we make sure it's supported in the mirks?

ChrisRackauckas commented 1 year ago

Should be fixed by https://github.com/SciML/BoundaryValueDiffEq.jl/pull/127

avik-pal commented 1 year ago

It's supposed to be supported. The docs are the correct one here. Did we update it to include parameters with the breaking? @avik-pal can we make sure it's supported in the mirks?

https://github.com/SciML/BoundaryValueDiffEq.jl/pull/133 will allow u0 as a function. I realized even currently we are able to handle it since it turns the initial guess function into u0(p, t0) which is not ideal. After that PR is merged, we can initialize the complete grid with this.

Did we update it to include parameters with the breaking?

No we did not. But I am supporting it via static_hasmethod. We can drop the only t version in the next breaking release i guess.

SciML / BoundaryValueDiffEq.jl

Very long compilation time for solve() #117