adrhill / SparseConnectivityTracer.jl

Fast operator-overloading Jacobian & Hessian sparsity detection.
MIT License
26 stars 2 forks source link

Zero-derivative functions on dual numbers should only return the primal #167

Closed adrhill closed 1 month ago

adrhill commented 1 month ago

This issue was raised in the context of round(Int, x) on Dual numbers in #164.

adrhill commented 1 month ago

If we do this, then only on functions that have zero-derivatives globally. Otherwise, it would lead to type instabilities.

gdalle commented 1 month ago

I agree with that. Now that we got rid of the connectivity tracer, we can do this for:

adrhill commented 1 month ago

I'm actually not yet fully convinced about convert(::Type{<:Integer}, x). I haven't come up with a counter example yet, but it essentially is the identity function. And identity(x::Dual{<:Integer}) also shouldn't just return the primal value. ForwardDiff also throws an InexactError instead of simply returning the primal.

adrhill commented 1 month ago

Viewing ForwardDiff and our Duals as dual numbers, it makes sense that $x + \varepsilon y$ should not simply be cast to $x$, independent of data type. Unless $y$ is zero (an empty tracer).

adrhill commented 1 month ago

To put this in terms of the Faà di Bruno based theory: the operator convert(T, x) is generally speaking once-differentiable.

If we start to make exceptions for argument types like convert(Int, x), then you could argue that *(a::Int, b::Int) also "isn't differentiable".

At that point you could even argue that SparseConnectivityTracer.Dual{<:Integer,T} and ForwardDiff.Dual{<:Integer,P} should be prohibited in the first place.

gdalle commented 1 month ago

I don't find the last statement completely absurd. If you construct a dual from an integer, you might as well keep that integer cause derivatives won't propagate at all from that moment on

adrhill commented 1 month ago

It is absurd, since it's totally viable to seed an dual number $x = a + \varepsilon b$ with Int primal and compute derivatives of $\sqrt{x}$. It's correct to get and want the following:

julia> jacobian_sparsity(sqrt, 1, TracerLocalSparsityDetector())
1×1 SparseArrays.SparseMatrixCSC{Bool, Int64} with 1 stored entry:
 1

But we digress. My point is that classifying differentiability of operators on a per-method basis (i.e. looking at argument types) is a huge can of worms.

gdalle commented 1 month ago

Okay I can get behind that. But then doesn't round also have methods for Float types? Why should we add in a special integer case there?

adrhill commented 1 month ago

round is always an operator with zero-derivatives, so here it makes sense to only return the primal–regardless of type. I think round(T, ...) is only defined on Integers and Missing:

julia> methods(round, [Type, Any])
# 12 methods for generic function "round" from Base:
  [1] round(::Type{Bool}, x::AbstractFloat)
     @ float.jl:391
  [2] round(::Type{>:Missing}, ::Missing)
     @ missing.jl:144
  [3] round(::Type{T}, x::Rational{Bool}) where T>:Missing
     @ missing.jl:150
  [4] round(::Type{T}, ::Missing) where T
     @ missing.jl:145
  [5] round(::Type{T}, x) where T>:Missing
     @ missing.jl:147
  [6] round(::Type{T}, x::Rational{Bool}) where T
     @ rational.jl:500
  [7] round(::Type{T}, x::Rational{Tr}) where {T>:Missing, Tr}
     @ missing.jl:149
  [8] round(::Type{T}, x::Rational{Tr}) where {T, Tr}
     @ rational.jl:493
  [9] round(::Type{T}, x::BigFloat) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:364
 [10] round(::Type{Integer}, x::BigFloat)
     @ Base.MPFR mpfr.jl:365
 [11] round(::Type{T}, x::Integer) where T<:Integer
     @ int.jl:691
 [12] round(::Type{T}, x::AbstractFloat) where T<:Integer
     @ float.jl:385

julia> methods(round, [Type, Any, Any])
# 14 methods for generic function "round" from Base:
  [1] round(::Type{BigInt}, x::BigFloat, r::RoundingMode)
     @ Base.MPFR mpfr.jl:353
  [2] round(::Type{BigInt}, x::BigFloat, r::Union{Base.MPFR.MPFRRoundingMode, RoundingMode})
     @ Base.MPFR mpfr.jl:344
  [3] round(::Type{T}, x::BigFloat, r::RoundingMode) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:351
  [4] round(::Type{<:Integer}, x::BigFloat, r::RoundingMode)
     @ Base.MPFR mpfr.jl:355
  [5] round(::Type{T}, x::BigFloat, r::Union{Base.MPFR.MPFRRoundingMode, RoundingMode}) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:335
  [6] round(::Type{T}, x::AbstractFloat, r::RoundingMode) where T<:Integer
     @ floatfuncs.jl:122
  [7] round(::Type{Dates.Date}, x::Union{Dates.Day, Dates.Week, Dates.TimePeriod, Dates.TimeType}, ::Type{P}) where P<:Dates.Period
     @ Dates ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Dates/src/rounding.jl:294
  [8] round(::Type{>:Missing}, ::Missing, ::RoundingMode)
     @ missing.jl:144
  [9] round(::Type{T}, x::Rational{Bool}, r::RoundingMode) where T>:Missing
     @ missing.jl:150
 [10] round(::Type{T}, ::Missing, ::RoundingMode) where T
     @ missing.jl:145
 [11] round(::Type{T}, x, r::RoundingMode) where T>:Missing
     @ missing.jl:147
 [12] round(::Type{T}, x::Rational{Bool}, ::RoundingMode) where T
     @ rational.jl:500
 [13] round(::Type{T}, x::Rational{Tr}, r::RoundingMode) where {T>:Missing, Tr}
     @ missing.jl:149
 [14] round(::Type{T}, x::Rational{Tr}, r::RoundingMode) where {T, Tr}
     @ rational.jl:493
gdalle commented 1 month ago

My reasoning was that if there's a round defined for the closest Float32 for instance (apparently there isn't?) then there are two ways to interpret this:

See the duality?

adrhill commented 1 month ago

Yeah, a theoretical myround(Float32, x::Float64) would be a weird edge case. But even there, my point applies: we don't have the manpower to overload operators on a per-method basis. We should therefore usually just pick the most conservative type of differentiability over the set of all methods.