baggepinnen / FluxOptTools.jl

Use Optim to train Flux models and visualize loss landscapes
MIT License
59 stars 4 forks source link

Convergence issue and using Automatic Differentiation (AD) #16

Open khorrami1 opened 1 year ago

khorrami1 commented 1 year ago

Hello, Thank you so much for your nice package. There is an issue in convergence. I am trying to solve the PDE NeuralPDE.jl. The equations, BCs, the neural network, and the optimization method are the same as NeuralPDE.jl. However, I cannot reproduce the results. BFGS() stops after a few iterations.

Also, I would like to use AD in the calculation of derivatives, e.g., Du_Dx(x,t), using Flux.gradient, Zygote.gradiet, or ForwardDiff.derivative. Unfortunately, I cannot use them. How can I use them?

Here is my code:

` using Flux, FluxOptTools, Optim, Statistics, ForwardDiff, Zygote using Plots

c = 1

chain_NN = Flux.Chain(Dense(2,16,Flux.tanh),Dense(16,16,Flux.tanh),Dense(16,1)) u(x,t) = chain_NN([x,t])[1] pars = Flux.params(chain_NN)

X = collect(range(0.1, 1.0-0.1, 10)) T = collect(range(0.1, 1.0-0.1, 10))

ϵ_FD = 10*eps(Float32)

# Du_Dx(x,t) = gradient((x)->u(x,t), x)[1] # Du_Dt(x,t) = gradient((t)->u(x,t), t)[1]

Du_Dx(x,t) = (u(x+ϵ_FD,t)-u(x,t))/ϵ_FD

D2u_Dx2(x,t) = (Du_Dx(x+ϵ_FD,t)-Du_Dx(x,t))/ϵ_FD

Du_Dt(x,t) = (u(x,t+ϵ_FD)-u(x,t))/ϵ_FD

D2u_Dt2(x,t) = (Du_Dt(x,t+ϵ_FD)-Du_Dt(x,t))/ϵ_FD

# D2u_Dt2(x,t) = gradient((t)->Du_Dt(x,t), t)[1]

# Du_Dx(x,t) = ForwardDiff.derivative((x)->u(x,t), x)[1] # Du_Dt(x,t) = ForwardDiff.derivative((t)->u(x,t), t)[1] # D2u_Dt2(x,t) = ForwardDiff.derivative((t)->Du_Dt(x,t), t)[1]

# [u(x,t) for x in X for t in T] # [D2u_Dt2(x,t) for x in X for t in T]

XX = [x for x in X for t in T] TT = [t for x in X for t in T]

loss_PDE() = mean(abs2, [D2u_Dt2(x,t)-c^2*D2u_Dx2(x,t) for (x,t) in zip(XX,TT)])

# loss_PDE() = mean(abs, [Du_Dt(x,0.0)-0.0 for (x,t) in zip(X,T)])

loss_BC1() = mean(abs2, [u(0.0,t)-0.0 for t in T]) loss_BC2() = mean(abs2, [u(1.0,t)-0.0 for t in T]) loss_BC3() = mean(abs2, [u(x,0.0)-x*(1.0-x) for x in X]) loss_BC4() = mean(abs2, [Du_Dt(x,0.0)-0.0 for x in X])

loss() = loss_PDE() + loss_BC1() + loss_BC2() + loss_BC3() + loss_BC4()

lossfun, gradfun, fg!, p0 = optfuns(loss, pars) res = Optim.optimize(Optim.only_fg!(fg!), p0, BFGS(), Optim.Options(iterations=1000, show_trace=true))

cb() = print("loss = ", loss(), "\n")

# Zygote.refresh()

# opt = Flux.Adam(0.001) # trace = [loss()] # for i = 1:500 # cb() # l,back = Zygote.pullback(loss, pars) # push!(trace, l) # grads = back(l) # Flux.Optimise.update!(opt, pars, grads) # end # trace

# opt = SLBFGS(lossfun,p0; m=3, ᾱ=1., ρ=false, λ=.0001, κ=0.1)

# function train(opt, p0, iters=20) # p = copy(p0) # g = zeros(veclength(pars)) # trace = [loss()] # for i = 1:iters # g = gradfun(g,p) # p = apply(opt, g, p) # push!(trace, opt.fold) # cb() # end # trace # end

# trace = train(opt,p0, 1000)

# Visualization

analytic_sol_func(x,t) = sum([(8/(k^3pi^3)) sin(kpix)cos(ckpit) for k in 1:2:50000])

u_predict = reshape([u(x,t) for x in X for t in T], length(X), length(T)) u_real = reshape([analytic_sol_func(x,t) for x in X for t in T], length(X), length(T))

diff_u = abs.(u_predict .- u_real) p1 = plot(X, T, u_real, linetype=:contourf,title = "analytic"); p2 =plot(X, T, u_predict, linetype=:contourf,title = "predict"); p3 = plot(X, T, diff_u,linetype=:contourf,title = "error"); plot(p1,p2,p3) `

Thank you in advance, Best regards, Mohammad

khorrami1 commented 1 year ago

I changed the value of ϵ_FD into 1e-2 for finite-difference calculation, so that it works (the loss function can decrease). I think this value depends on the discretization on the collocation points for coordinates (X) and time (T).