Closed gdalle closed 1 month ago
Further investigation: when I run the tests with
julia --check-bounds=yes
I get a different error:
julia> sol1 = solve(bvp1, solver; dt=dt)
ERROR: BoundsError: attempt to access 4-element UnitRange{Int64} at index [5]
Stacktrace:
[1] throw_boundserror(A::UnitRange{Int64}, I::Int64)
@ Base ./abstractarray.jl:737
[2] getindex
@ ./range.jl:930 [inlined]
[3] _colorediteration!
@ ~/.julia/packages/FiniteDiff/wm6iC/src/iteration_utils.jl:4 [inlined]
[4] forwarddiff_color_jacobian!(J::SubArray{…}, f::BoundaryValueDiffEq.__Fix3{…}, x::Vector{…}, jac_cache::ForwardColorJacCache{…})
@ SparseDiffTools ~/.julia/packages/SparseDiffTools/RGOJw/src/differentiation/compute_jacobian_ad.jl:392
[5] sparse_jacobian!
@ ~/.julia/packages/SparseDiffTools/RGOJw/src/highlevel/forward_mode.jl:63 [inlined]
[6] __mirk_mpoint_jacobian!(J::Matrix{…}, ::BandedMatrices.BandedMatrix{…}, x::Vector{…}, bc_diffmode::AutoForwardDiff{…}, nonbc_diffmode::AutoSparse{…}, bc_diffcache::SparseDiffTools.ForwardDiffJacobianCache{…}, nonbc_diffcache::SparseDiffTools.ForwardDiffJacobianCache{…}, loss_bc::BoundaryValueDiffEq.__Fix3{…}, loss_collocation::BoundaryValueDiffEq.__Fix3{…}, resid_bc::Vector{…}, resid_collocation::Vector{…}, L::Int64)
@ BoundaryValueDiffEq ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:390
[7] #268
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:371 [inlined]
[8] JacobianCache
@ ~/.julia/dev/NonlinearSolve/src/internal/jacobian.jl:149 [inlined]
[9] JacobianCache
@ ~/.julia/dev/NonlinearSolve/src/internal/jacobian.jl:120 [inlined]
[10] __step!(cache::NonlinearSolve.GeneralizedFirstOrderAlgorithmCache{…}; recompute_jacobian::Nothing, kwargs::@Kwargs{})
@ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/core/generalized_first_order.jl:230
[11] __step!
@ ~/.julia/dev/NonlinearSolve/src/core/generalized_first_order.jl:226 [inlined]
[12] #step!#131
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:50 [inlined]
[13] step!
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:45 [inlined]
[14] solve!(cache::NonlinearSolve.GeneralizedFirstOrderAlgorithmCache{…})
@ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/core/generic.jl:13
[15] #__solve#130
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:4 [inlined]
[16] __solve
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:1 [inlined]
[17] macro expansion
@ ~/.julia/dev/NonlinearSolve/src/default.jl:282 [inlined]
[18] __solve(::NonlinearProblem{…}, ::NonlinearSolvePolyAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, verbose::Bool, kwargs::@Kwargs{…})
@ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/default.jl:255
[19] __perform_mirk_iteration(cache::BoundaryValueDiffEq.MIRKCache{…}, abstol::Float64, adaptive::Bool; nlsolve_kwargs::@NamedTuple{}, kwargs::@Kwargs{…})
@ BoundaryValueDiffEq ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:175
[20] __perform_mirk_iteration
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:171 [inlined]
[21] solve!(cache::BoundaryValueDiffEq.MIRKCache{…})
@ BoundaryValueDiffEq ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:152
[22] #__solve#296
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/BoundaryValueDiffEq.jl:51 [inlined]
[23] __solve
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/BoundaryValueDiffEq.jl:49 [inlined]
[24] #solve_call#44
@ ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:612 [inlined]
[25] solve_call
@ ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:569 [inlined]
[26] solve_up(prob::BVProblem{…}, sensealg::Nothing, u0::Vector{…}, p::Tuple{…}, args::MIRK4{…}; kwargs::@Kwargs{…})
@ DiffEqBase ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:1084
[27] solve_up
@ ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:1070 [inlined]
[28] solve(prob::BVProblem{…}, args::MIRK4{…}; sensealg::Nothing, u0::Nothing, p::Nothing, wrap::Val{…}, kwargs::@Kwargs{…})
@ DiffEqBase ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:1007
[29] top-level scope
@ ~/Work/GitHub/Julia/Sandbox/test.jl:47
Some type information was truncated. Use `show(err)` to see complete types.
So the one I was observing before might have been due to incorrect use of @inbounds
in FiniteDiff:
And in this same part of FiniteDiff, the array we're incorrectly accessing is rows_index
because the matrix has more columns than rows. This is right before the error occurs:
i = 5
rows_index = 1:4
cols_index = 1:8
vfx = [1.0, 0.0, 0.0, 0.0]
ERROR: BoundsError: attempt to access 4-element UnitRange{Int64} at index [5]
[18] __solve(::NonlinearProblem{…}, ::NonlinearSolvePolyAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, verbose::Bool, kwargs::@Kwargs{…}) @ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/default.jl:255
it should be square? What are the dimensions?
size(J) = (4, 8)
size(sparsity) = (4, 8)
typeof(J) = SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}
typeof(sparsity) = BandedMatrices.BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}
oh it's differencing the banded subproblem, I see.
Shouldn't this end up in the bandedmatrix iteration https://github.com/JuliaArrays/ArrayInterface.jl/blob/master/ext/ArrayInterfaceBandedMatricesExt.jl#L56-L62?
It probably doesn't because J
is not itself banded. But even when we miss that dispatch we should still get the correct answer and not an indexing error.
I think there is something fundamentally wrong with the default FiniteDiff._colorediteration!
, so if someone can tell me what each argument is for we can try to look into it.
Agreed, but it looks like there's two errors here. One is that NonlinearSolve changes are now missing the BandedMatrix dispatches which isn't great, but then the second is that the dense fallback should be corrected.
Only the second one is an error, the first one is a suboptimality. I guess we need to ask @avik-pal where the jacobian prototype is built for this one, and how it ends up being a SubArray
https://github.com/JuliaDiff/FiniteDiff.jl/blob/b574440c5d54708aa23a90c48f4a000dd1c6d915/ext/FiniteDiffBandedMatricesExt.jl#L13 if you relax Jac::BandedMatrices.BandedMatrix
, does it hit it and the error goes away?
Partial bisection done
git bisect start
# status: waiting for both good and bad commits
# good: [f667683bb6c5bf78a8cef5defa52f887c526fc71] feat: integrate SciMLJacobianOperators into NonlinearSolve
git bisect good f667683bb6c5bf78a8cef5defa52f887c526fc71
# status: waiting for bad commit, 1 good commit known
# bad: [d56476c69c749a4b575b47d58c03cf366ab7134c] fix: use DI preparation result when initializing Jacobian (#472)
git bisect bad d56476c69c749a4b575b47d58c03cf366ab7134c
# skip: [fe1bd3dd6ec4923afad60453600316038b3cd9f1] chore: bump minimum versions
git bisect skip fe1bd3dd6ec4923afad60453600316038b3cd9f1
# skip: [59b7d022bc282a04f858b973954b26fc52792771] fix: update bruss testing
git bisect skip 59b7d022bc282a04f858b973954b26fc52792771
# skip: [92c59dbb2d360a4a2ed968668395473be57fc81e] chore: run formatter
git bisect skip 92c59dbb2d360a4a2ed968668395473be57fc81e
# skip: [058b87d5af338ad7e2049f2f27c95b476798d32b] feat: remove Setfield dep
git bisect skip 058b87d5af338ad7e2049f2f27c95b476798d32b
# skip: [3f59f4949829b74d78b9eea8de834dd4f9239152] fix: autodiff cannot be nothing
git bisect skip 3f59f4949829b74d78b9eea8de834dd4f9239152
# skip: [40eb6c867fbed99368e7005ae1ba4246083269eb] chore: mark master as DEV (DON'T TAG)
git bisect skip 40eb6c867fbed99368e7005ae1ba4246083269eb
# skip: [adac560c3f0f737b5c854d61642a182078bb50a1] fix: __value_derivative removal from line searches
git bisect skip adac560c3f0f737b5c854d61642a182078bb50a1
# skip: [b2bb25252e3526b753b3852dfbfff9cc386e3f7a] fix: missing qualifier
git bisect skip b2bb25252e3526b753b3852dfbfff9cc386e3f7a
# skip: [3b12e73e945941ac8d0d4eebe038f1c8ae15d5cf] fix: unwrap sparse AD for derivative call
git bisect skip 3b12e73e945941ac8d0d4eebe038f1c8ae15d5cf
# skip: [43df907989a67b707e433753b09ef3bcd3d08a4c] test: structured jacobians
git bisect skip 43df907989a67b707e433753b09ef3bcd3d08a4c
# skip: [0f2a62f00334d680096e55c9a8ecdc8012819aef] fix: minor test fixes
git bisect skip 0f2a62f00334d680096e55c9a8ecdc8012819aef
# good: [bdacb9ca9327b9d5d2fe6a8f0c08f44ac206f8ec] docs: disable ILU for now
git bisect good bdacb9ca9327b9d5d2fe6a8f0c08f44ac206f8ec
# bad: [b6221b7d61027339ec5fc50bb1f3f90398ce607e] refactor: reorder imports
git bisect bad b6221b7d61027339ec5fc50bb1f3f90398ce607e
# skip: [ba45097e841f9875a7e53d94a1d86f1ffcfd1425] fix: sparse jacobian
git bisect skip ba45097e841f9875a7e53d94a1d86f1ffcfd1425
# good: [6201fcc2f49e2ec2019eb61491f30bc9664e0dc5] fix: DI now works with ReverseDiff
git bisect good 6201fcc2f49e2ec2019eb61491f30bc9664e0dc5
Yes, if I relax that I get a lot of these warnings but at least the solve works:
┌ Warning: Potential Rank Deficient Matrix Detected. Attempting to solve using Pivoted QR Factorization.
└ @ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/internal/linear_solve.jl:156
Okay cool, one mystery solved. And that looks optimal too, since it's using the 2D indexing. That's worth a PR.
The warnings, we should probably pass verbose false there but they make sense.
Now for the fallback, vfx
is the vector form of f(x)
. colorvec
should make sense. rows_index, cols_index = ArrayInterface.findstructralnz(sparsity)
i.e. they are defined the same as findnz(::SparseMatrixCSC)
, which is that, in column order, (rows_index[i], cols_index[i]) is the (i,j) index of the ith non-zero element.
For dense then we should specialize on Matrix and do a fast path if square. If non-square, these should be:
julia> n = 4
4
julia> m = 5
5
julia> A = rand(4,5)
4×5 Matrix{Float64}:
0.0732819 0.236601 0.971541 0.0239939 0.834559
0.297407 0.472245 0.963563 0.841125 0.296606
0.467831 0.297999 0.760833 0.87123 0.309619
0.781913 0.879559 0.871359 0.120851 0.0749186
julia> col_idxs = vec(stack([i*ones(Int,n) for i in 1:m]))
20-element Vector{Int64}:
1
1
1
1
2
2
2
⋮
4
4
5
5
5
5
julia> row_idxs = vec(stack([1:m for i in 1:n]))
20-element Vector{Int64}:
1
2
3
4
5
1
2
⋮
5
1
2
3
4
5
git bisect start
# status: waiting for both good and bad commits
# good: [f667683bb6c5bf78a8cef5defa52f887c526fc71] feat: integrate SciMLJacobianOperators into NonlinearSolve
git bisect good f667683bb6c5bf78a8cef5defa52f887c526fc71
# status: waiting for bad commit, 1 good commit known
# bad: [d56476c69c749a4b575b47d58c03cf366ab7134c] fix: use DI preparation result when initializing Jacobian (#472)
git bisect bad d56476c69c749a4b575b47d58c03cf366ab7134c
# skip: [fe1bd3dd6ec4923afad60453600316038b3cd9f1] chore: bump minimum versions
git bisect skip fe1bd3dd6ec4923afad60453600316038b3cd9f1
# skip: [59b7d022bc282a04f858b973954b26fc52792771] fix: update bruss testing
git bisect skip 59b7d022bc282a04f858b973954b26fc52792771
# skip: [92c59dbb2d360a4a2ed968668395473be57fc81e] chore: run formatter
git bisect skip 92c59dbb2d360a4a2ed968668395473be57fc81e
# skip: [058b87d5af338ad7e2049f2f27c95b476798d32b] feat: remove Setfield dep
git bisect skip 058b87d5af338ad7e2049f2f27c95b476798d32b
# skip: [3f59f4949829b74d78b9eea8de834dd4f9239152] fix: autodiff cannot be nothing
git bisect skip 3f59f4949829b74d78b9eea8de834dd4f9239152
# skip: [40eb6c867fbed99368e7005ae1ba4246083269eb] chore: mark master as DEV (DON'T TAG)
git bisect skip 40eb6c867fbed99368e7005ae1ba4246083269eb
# skip: [adac560c3f0f737b5c854d61642a182078bb50a1] fix: __value_derivative removal from line searches
git bisect skip adac560c3f0f737b5c854d61642a182078bb50a1
# skip: [b2bb25252e3526b753b3852dfbfff9cc386e3f7a] fix: missing qualifier
git bisect skip b2bb25252e3526b753b3852dfbfff9cc386e3f7a
# skip: [3b12e73e945941ac8d0d4eebe038f1c8ae15d5cf] fix: unwrap sparse AD for derivative call
git bisect skip 3b12e73e945941ac8d0d4eebe038f1c8ae15d5cf
# skip: [43df907989a67b707e433753b09ef3bcd3d08a4c] test: structured jacobians
git bisect skip 43df907989a67b707e433753b09ef3bcd3d08a4c
# skip: [0f2a62f00334d680096e55c9a8ecdc8012819aef] fix: minor test fixes
git bisect skip 0f2a62f00334d680096e55c9a8ecdc8012819aef
# good: [bdacb9ca9327b9d5d2fe6a8f0c08f44ac206f8ec] docs: disable ILU for now
git bisect good bdacb9ca9327b9d5d2fe6a8f0c08f44ac206f8ec
# bad: [b6221b7d61027339ec5fc50bb1f3f90398ce607e] refactor: reorder imports
git bisect bad b6221b7d61027339ec5fc50bb1f3f90398ce607e
# skip: [ba45097e841f9875a7e53d94a1d86f1ffcfd1425] fix: sparse jacobian
git bisect skip ba45097e841f9875a7e53d94a1d86f1ffcfd1425
# good: [6201fcc2f49e2ec2019eb61491f30bc9664e0dc5] fix: DI now works with ReverseDiff
git bisect good 6201fcc2f49e2ec2019eb61491f30bc9664e0dc5
# bad: [33ac4501df94150288f0e7c15a5aff9ab6c20792] fix: update minimum compats
git bisect bad 33ac4501df94150288f0e7c15a5aff9ab6c20792
# good: [05aa3db8351e2da44729e99795bd2b30f7c7a601] fix: remove stale dep
git bisect good 05aa3db8351e2da44729e99795bd2b30f7c7a601
# bad: [a8e60d5f379d89d73d5d4509b3512768e55365a4] feat: use DI for dense jacobians
git bisect bad a8e60d5f379d89d73d5d4509b3512768e55365a4
# first bad commit: [a8e60d5f379d89d73d5d4509b3512768e55365a4] feat: use DI for dense jacobians
found it [a8e60d5f379d89d73d5d4509b3512768e55365a4] feat: use DI for dense jacobians
Trying to figure out why the prototype isn't banded. We get a SubArray
because of the view
s here
https://github.com/SciML/BoundaryValueDiffEq.jl/blob/0a2b4ac73413fcf3ab1514607a4e816fe4d9ccc3/src/solve/mirk.jl#L386-L393
I guess the question to @avik-pal is where does the creation of cache.J
happen for this example?
found it [https://github.com/SciML/NonlinearSolve.jl/commit/a8e60d5f379d89d73d5d4509b3512768e55365a4] feat: use DI for dense jacobians
That makes a lot of sense. It must've changed the jac or sparsity type here. We can:
For 1 I have a hunch: if you're using DI with a dense backend to compute your prototype, you're always gonna get something that is the result of stack
, hence a Matrix
. If you wanted to preserve the sparsity pattern (in this case banded) you would need to compute the prototype with an AutoSparse
. @avik-pal does that resonate with you?
infil> f.jac_prototype
8×8 AlmostBandedMatrix{Float64} with bandwidths (5, 4) and fill rank 4 with data 8×8 BandedMatrix{Float64} with bandwidths (5, 4) and fill 4×8 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
⋅ 1.0 1.0 1.0 1.0 1.0 1.0 1.0
⋅ ⋅ 1.0 1.0 1.0 1.0 1.0 1.0
infil> similar(f.jac_prototype)
8×8 Matrix{Float64}:
2.122e-314 1.2732e-313 1.4854e-313 1.9098e-313 2.3342e-313 2.54639e-313 2.75859e-313 7.0e-323
4.24399e-314 3.0e-323 1.4854e-313 1.9098e-313 2.3342e-313 2.54639e-313 2.75859e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.54639e-313 2.75859e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.54639e-313 2.97079e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.75859e-313 2.97079e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.75859e-313 2.97079e-313 3.18299e-313
1.061e-313 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.75859e-313 2.97079e-313 7.4e-323
1.2732e-313 1.4854e-313 1.9098e-313 2.3342e-313 2.54639e-313 2.75859e-313 2.97079e-313 7.4e-323
Okay then that's not on me :rofl: good find Avik!
I have always hated Julia's fallback semantics for similar :angry:
Oh I see what's going on here.
Is an internal setup to say "are you going to use structural_nz anyways? Don't make the arrays if you don't have to". It checks the sparsity pattern and says, "banded matrix? We have one of those at home https://github.com/JuliaDiff/FiniteDiff.jl/blob/b574440c5d54708aa23a90c48f4a000dd1c6d915/ext/FiniteDiffBandedMatricesExt.jl#L13-L27 which effectively just uses a trivial rows/cols, so don't spend the time building that array". Then Jac
isn't a banded matrix so it doesn't use that dispatch 😅.
So what's the plan of action here? If we define a generic findstructralnz
for AbstractMatrix
then we no longer need these two wrong lines, right?
But it would maybe send a signal that dense matrices also have a structure
Those should probably just be nothing
since they are free. Basically, if you need the structuralnz, compute it once. If you don't need it, then they are nothing. If you need something like that inside of the specialization because it's "free", so banded matrix and Matrix can just
rows_index = 1:size(J, 1)
cols_index = 1:size(J, 2)
on the inside which is non-allocating so it's fine to do it there. nothing
is just a clearer sentinel.
Or we can just make sure it returns a correct non-allocating iterator. What would be bad is to allocate some big arrays (O(nonzeros)) that we don't end up using because the iteration scheme uses an analytical solution though. That would be a bit wasteful because for a matrix it's as big as the matrix itself.
Here's an easy solution:
julia> using Base.Iterators
julia> using BenchmarkTools
julia> function findstructralnz(x::AbstractMatrix)
cartind = vec(CartesianIndices(x))
return Iterators.map(first ∘ Tuple, cartind), Iterators.map(last ∘ Tuple, cartind)
end
findstructralnz (generic function with 1 method)
julia> collect.(findstructralnz(rand(2, 3)))
([1, 2, 1, 2, 1, 2], [1, 1, 2, 2, 3, 3])
julia> @btime findstructralnz($(rand(2, 3)));
28.162 ns (0 allocations: 0 bytes)
👍that looks pretty decent.
If that's okay I would like to let you insert it in ArrayInterface and modify SparseDiffTools though, I don't know these code bases and I don't want to accidentally break something
Have a nice weekend!
One day I'll remember what weekends are 😅
I think the least disruptive fix is this one:
And to avoid the diagnosis issues we've ran into here:
And https://github.com/JuliaDiff/FiniteDiff.jl/pull/194 is the other part, and https://github.com/SciML/FastAlmostBandedMatrices.jl/pull/16 is the better similar. So I think all facets of this are getting handled.
Merged, yay.
Describe the bug 🐞
The new NonlinearSolve v3.15 (with DI in it) causes a BoundaryValueDiffEq solve to error or segfault.
Expected behavior
The solve should run without errors, as it does with NonlinearSolve 3.14.
Minimal Reproducible Example 👇
Error & Stacktrace ⚠️
Environment (please complete the following information):
using Pkg; Pkg.status()
using Pkg; Pkg.status(; mode = PKGMODE_MANIFEST)
versioninfo()
Additional context
The bug is probably not deterministic. Sometimes it doesn't even happen, and sometimes it downright causes a segfault.
By
pkg> dev
-ing NonlinearSolve and adding some logging before linesrc/internal/linear_solve.jl:231
, I obtained some more information about the matriceslincache.A
andnew_A
:In other words, the
copyto!
fails even though both matrices have the samesize
, because the one fromlincache
doesn't have the rightlength
. Note thatlength(lincache.A)
is not always 1.