JuliaSmoothOptimizers / Krylov.jl

A Julia Basket of Hand-Picked Krylov Methods
Other
355 stars 52 forks source link

Support custom vector types #905

Open amontoison opened 1 week ago

amontoison commented 1 week ago

In some specific applications, users may want to use their own vector types. For example, in Oceananigans.jl, each vector is a slice of a 3D matrix and is of the type Field. (cc @glwagner) --> https://github.com/CliMA/Oceananigans.jl/pull/3812

The best solution is to document how we can create a modular wrapper that fits the user's needs, with a very simple and limited API to implement.

I quickly experimented with a custom type that represents a vector of dual numbers, and it works.
Off-topic: @michel2323, this could be a way to perform low-level differentiation directly in Krylov.jl.

using ForwardDiff, Krylov
import ForwardDiff.Dual
import Krylov.FloatOrComplex

struct ADVector{T, V, N} <: AbstractVector{T}
    v::Vector{Dual{V, T, N}}
end

function ADVector{T, V, N}(::UndefInitializer, n::Int) where {T, V, N}
    ad_vector = [ForwardDiff.Dual{V, T, N}(0.0) for _ in 1:n]
    ADVector{T, V, N}(ad_vector)
end

function ADVector{T}(::UndefInitializer, n::Int) where T
    ad_vector = [ForwardDiff.Dual{Nothing, T, 0}(0.0) for _ in 1:n]
    ADVector{T, Nothing, 0}(ad_vector)
end

function ADVector{T}(v::Vector{T}) where T
    n = length(v)
    ad_vector = [ForwardDiff.Dual{Nothing, T, 0}(v[i]) for i in 1:n]
    return ADVector{T, Nothing, 0}(ad_vector)
end

function Base.length(v::ADVector)
    return length(v.v)
end

function Base.size(v::ADVector)
    return (length(v.v),)
end

function Base.getindex(v::ADVector, i::Int)
    return v.v[i]
end

function Base.setindex!(v::ADVector{T, V, N}, value::ForwardDiff.Dual{V, T, N}, i::Int) where {T, V, N}
    v.v[i] = value
    return v
end

mul!(y::ADVector{T}, A::Matrix{T}, x::Vector{T}) where T <: FloatOrComplex = mul!(y.v, A, x.v)

Krylov.kdot(n::Integer, x::ADVector{T}, dx::Integer, y::ADVector{T}, dy::Integer) where T <: FloatOrComplex = dot(x.v, y.v)
Krylov.knrm2(n::Integer, x::ADVector{T}, dx::Integer) where T <: FloatOrComplex = norm(x.v)

Krylov.kscal!(n::Integer, s, x::ADVector{T}, dx::Integer) where T <: FloatOrComplex = (x.v .*= s)

Krylov.kaxpy!(n::Integer, s, x::ADVector{T}, dx::Integer, y::ADVector{T}, dy::Integer) where T <: FloatOrComplex = axpy!(s, x.v, y.v)

Krylov.kaxpby!(n::Integer, s, x::ADVector{T}, dx::Integer, t, y::ADVector{T}, dy::Integer) where T <: FloatOrComplex = axpby!(s, x.v, t, y.v)

Krylov.kcopy!(n::Integer, x::ADVector{T}, dx::Integer, y::ADVector{T}, dy::Integer) where T <: FloatOrComplex = copyto!(y.v, x.v)

Krylov.kfill!(x::ADVector{T}, val::T) where T <: FloatOrComplex = fill!(x.v, val)

I should clean up the low-level API (functions starting with k), as these are not publicly available or documented for the user.

@michel2323 @glwagner
I believe this is also the best way to introduce MPI usage in the future. It will be independent of Krylov.jl and can be applied to any package.

glwagner commented 1 week ago

In some specific applications, users may want to use their own vector types. For example, in Oceananigans.jl, each vector is a slice of a 3D matrix and is of the type Field. (cc @glwagner)

To clarify, what we have is a situation in which our data should be considered as a 1D vector for linear algebra, but represents a 3D field in a physical context.

Consider a 2D array that represents, for example, an image:

julia> c = rand(3, 3)
3×3 Matrix{Float64}:
 0.515044  0.981678  0.0473222
 0.437767  0.142471  0.756243
 0.284507  0.647186  0.445389

A "diffusion" operation, which can be interpreted as a kind of filter, is represented by the following operator:

diffuse(i, j, c) = - 4 * c[i, j] + c[i+1, j] + c[i, j+1] + c[i-1, j] + c[i, j-1]

The operator diffuse can also be written as a 9 x 9 matrix acting on the "vectorization" of c:

julia> c_vector = c[:]
9-element Vector{Float64}:
 0.5150436482179653
 0.43776719843960443
 0.28450746235635493
 0.9816775509672155
 0.14247143653959915
 0.6471861061623826
 0.0473222143036538
 0.7562428994779301
 0.4453890152898158

Said again, we can apply the diffuse operator to c either by forming a product A_diffuse * c_vector where A_diffuse is a (sparse) matrix that represents the operator diffuse --- or by applying the function diffuse(i, j, c) in a loop over i = 1:size(c, 1) and j = size(c, 2).

In Oceananigans we are concerned with 3D fluid dynamics, so you can take the above analogy but apply to a 3D matrix. If the matrix has Nx, Ny, Nz = size(c), then the "operator" has size (Nx * Ny * Nz)^2.

While we could in principle represent our operators with sparse matrices and then evaluate matrix-vector products with sparse linear algebra, there's no intrinsic benefit to spending memory on the sparse matrix; we are interested in iterative methods that allow us to avoid allocating operator memory entirely.

I think there are probably many such contexts, where the "native" data representation is an M-dimensional matrix with dimension sizes, N = [N1, N2, ..., NM], but which can also be represented as a vector of length prod(N), which is acted on by a matrix of size prod(N)^2.

amontoison commented 1 week ago

Thanks for the explanation! I started to clean the low-level API in #907 to provide what you need. We probably just need to create a wrapper in Oceanigans.jl to represent the 3D field as an abstract vector and define these functions now:

I will document that after INFORMS.

amontoison commented 1 hour ago

@glwagner I just added an example to the documentation explaining how a 3D field in a physical context, represented as a 3D array, can also represent a 1D vector for linear algebra: https://jso.dev/Krylov.jl/dev/matrix_free/#Example-with-discretized-PDE.

I’m wondering if we can do something similar for Field so that we can still target BLAS, CUBLAS, rocBLAS, ....

If it’s not possible, we will define our own wrapper as we discussed last week (not documented yet). The low-level API has now been cleaned up, and I released version 0.9.8 today, so we can try it in Oceanigans.jl.