Enzyme internal error while running neural ODE with Lux + Enzyme

heyyeahcrow commented 3 days ago

Hi,

I tried to use AutoEnzyme as an optimizer to build a neural network to predict parameters for ODEs following the example of DiffEqFlux, but it turned out to return an Enzyme internal error and a bunch of LLVM computations.

using Lux, DiffEqFlux, OrdinaryDiffEq, Plots, Printf, Statistics
using ComponentArrays
using Optimization, OptimizationOptimisers
using Enzyme
using Dates
using Random
using StaticArrays

function evolve!(dc, c, p, t)
    p1 = p[1]
    p2 = p[2]
    dc .= c .* p2 * p1
end

function simulate(i1, i2, a, b, t_span)
    p2 = exp(-i2 * a)
    p1 = i1 * b
    p = (p1, p2)
    c0 = [1.0 0.0; 1.0 0.0]
    prob = ODEProblem(evolve!, c0, t_span, p)
    sol = solve(prob, Euler(), save_everystep=false, dt = 0.5)
    return Array(sol[end])
end

rng = Xoshiro(0)
b = [0.0 1.0; 1.0 0.0]
a = 0.6
n = length(b[1, :])
i1 = 0.18
i2 = 2.5 
timespan = (0.0, 5.0)
ans = simulate(i1, i2, a, b, timespan)

display(ans)

inputs = [i1, i2]
input_size = length(inputs)
output_size = length(a) + length(b)
nn = Chain(
    Dense(input_size, input_size*3*n, tanh),
    Dense(input_size*3*n, output_size*2, tanh),
    Dense(output_size*2, output_size, sigmoid)
)

u, st = Lux.setup(rng, nn)

function predict_neuralode(u)
    # Get parameters from the neural network
    output, outst = nn(inputs, u, st)
    # Segregate the output
    p_a = output[1]
    pp_b = output[length(a)+1:end]
    p_b = zeros(n, n)
    index = 1
    for i in 1:n
        for j in 1:n
            p_b[i, j] = pp_b[index]
            index += 1
        end
    end
    nn_output = [p_a, p_b]
    println("nn_output: ", nn_output)
    pred = simulate(i1, i2, p_a, p_b, timespan)
    return Array(pred)
end

function loss_neuralode(ans, u)
    pred = predict_neuralode(u)
    loss = sum(abs2, ans .- pred)
    return loss, pred
end

loss, pred = loss_neuralode(ans, u)

loss_values = Float64[]
callback = function (p, l, pred; doplot = false)
    println(l)
    push!(loss_values, l)
end

pinit = ComponentArray(u)
callback(pinit, loss_neuralode(ans, pinit)...)

adtype = Optimization.AutoEnzyme()

optf = Optimization.OptimizationFunction((u,_) -> loss_neuralode(ans, u), adtype)
optprob = Optimization.OptimizationProblem(optf, pinit)

result_neuralode = Optimization.solve(
    optprob, OptimizationOptimisers.Adam(0.02); callback = callback, maxiters = 50)

The error log: error_log_2024-11-19_15-56-09.txt The stacktrace: Stacktrace.txt

I also tried to run the example by replacing the Zygote and AutoZygote with Enzyme and AutoEnzyme, but it still returned the same error. They happened on both Mac and Windows systems.

Julia Version 1.11.1 Packages: [b0b7db55] ComponentArrays v0.15.18 [aae7a2af] DiffEqFlux v4.1.0 [7da242da] Enzyme v0.13.14 ⌅ [d9f16b24] Functors v0.4.12 [e6f89c97] LoggingExtras v1.1.0 [b2108857] Lux v1.2.3 [7f7a1694] Optimization v4.0.5 [42dfb2eb] OptimizationOptimisers v0.3.4 [1dea7af3] OrdinaryDiffEq v6.90.1 [91a5bcdd] Plots v1.40.9 [90137ffa] StaticArrays v1.9.8 [10745b16] Statistics v1.11.1 [e88e6eb3] Zygote v0.6.73 [ade2ca70] Dates v1.11.0 [56ddb016] Logging v1.11.0 [de0858da] Printf v1.11.0 [9a3f8284] Random v1.11.0

wsmoses commented 2 days ago

Hi, this looks like an error in tje 1.11 FFI call support in Enzyme. Two quick things: 1) Can you test on the latest version of enzyme (I think this ought be fixed, if not we should fix it) 2) Can you make a reproducer that only has a direct autodiff call? @ChrisRackauckas may be able to help you with this

heyyeahcrow commented 2 days ago

Hi, this looks like an error in tje 1.11 FFI call support in Enzyme. Two quick things: 1) Can you test on the latest version of enzyme (I think this ought be fixed, if not we should fix it) 2) Can you make a reproducer that only has a direct autodiff call? @ChrisRackauckas may be able to help you with this

I updated it and still showed the same error. I'm currently trying the second path.

BTW, do I need to vectorize all the inputs and outputs of my ODE?

wsmoses commented 2 days ago

I don't think that should be needed here to get it to fail, but that's just an intuition

EnzymeAD / Enzyme.jl

Enzyme internal error while running neural ODE with Lux + Enzyme #2110