Zero-derivative functions on dual numbers should only return the primal

adrhill commented 1 month ago

This issue was raised in the context of round(Int, x) on Dual numbers in #164.

adrhill commented 1 month ago

If we do this, then only on functions that have zero-derivatives globally. Otherwise, it would lead to type instabilities.

gdalle commented 1 month ago

I agree with that. Now that we got rid of the connectivity tracer, we can do this for:

convert(::Type{<:Integer}, x)
round, floor, ceil
all the euclidean division functions
what else?

adrhill commented 1 month ago

I'm actually not yet fully convinced about convert(::Type{<:Integer}, x). I haven't come up with a counter example yet, but it essentially is the identity function. And identity(x::Dual{<:Integer}) also shouldn't just return the primal value. ForwardDiff also throws an InexactError instead of simply returning the primal.

adrhill commented 1 month ago

Viewing ForwardDiff and our Duals as dual numbers, it makes sense that $x + \varepsilon y$ should not simply be cast to $x$, independent of data type. Unless $y$ is zero (an empty tracer).

adrhill commented 1 month ago

To put this in terms of the Faà di Bruno based theory: the operator convert(T, x) is generally speaking once-differentiable.

If we start to make exceptions for argument types like convert(Int, x), then you could argue that *(a::Int, b::Int) also "isn't differentiable".

At that point you could even argue that SparseConnectivityTracer.Dual{<:Integer,T} and ForwardDiff.Dual{<:Integer,P} should be prohibited in the first place.

gdalle commented 1 month ago

I don't find the last statement completely absurd. If you construct a dual from an integer, you might as well keep that integer cause derivatives won't propagate at all from that moment on

adrhill commented 1 month ago

It is absurd, since it's totally viable to seed an dual number $x = a + \varepsilon b$ with Int primal and compute derivatives of $\sqrt{x}$. It's correct to get and want the following:

julia> jacobian_sparsity(sqrt, 1, TracerLocalSparsityDetector())
1×1 SparseArrays.SparseMatrixCSC{Bool, Int64} with 1 stored entry:
 1

But we digress. My point is that classifying differentiability of operators on a per-method basis (i.e. looking at argument types) is a huge can of worms.

gdalle commented 1 month ago

Okay I can get behind that. But then doesn't round also have methods for Float types? Why should we add in a special integer case there?

adrhill commented 1 month ago

round is always an operator with zero-derivatives, so here it makes sense to only return the primal–regardless of type. I think round(T, ...) is only defined on Integers and Missing:

julia> methods(round, [Type, Any])
# 12 methods for generic function "round" from Base:
  [1] round(::Type{Bool}, x::AbstractFloat)
     @ float.jl:391
  [2] round(::Type{>:Missing}, ::Missing)
     @ missing.jl:144
  [3] round(::Type{T}, x::Rational{Bool}) where T>:Missing
     @ missing.jl:150
  [4] round(::Type{T}, ::Missing) where T
     @ missing.jl:145
  [5] round(::Type{T}, x) where T>:Missing
     @ missing.jl:147
  [6] round(::Type{T}, x::Rational{Bool}) where T
     @ rational.jl:500
  [7] round(::Type{T}, x::Rational{Tr}) where {T>:Missing, Tr}
     @ missing.jl:149
  [8] round(::Type{T}, x::Rational{Tr}) where {T, Tr}
     @ rational.jl:493
  [9] round(::Type{T}, x::BigFloat) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:364
 [10] round(::Type{Integer}, x::BigFloat)
     @ Base.MPFR mpfr.jl:365
 [11] round(::Type{T}, x::Integer) where T<:Integer
     @ int.jl:691
 [12] round(::Type{T}, x::AbstractFloat) where T<:Integer
     @ float.jl:385

julia> methods(round, [Type, Any, Any])
# 14 methods for generic function "round" from Base:
  [1] round(::Type{BigInt}, x::BigFloat, r::RoundingMode)
     @ Base.MPFR mpfr.jl:353
  [2] round(::Type{BigInt}, x::BigFloat, r::Union{Base.MPFR.MPFRRoundingMode, RoundingMode})
     @ Base.MPFR mpfr.jl:344
  [3] round(::Type{T}, x::BigFloat, r::RoundingMode) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:351
  [4] round(::Type{<:Integer}, x::BigFloat, r::RoundingMode)
     @ Base.MPFR mpfr.jl:355
  [5] round(::Type{T}, x::BigFloat, r::Union{Base.MPFR.MPFRRoundingMode, RoundingMode}) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:335
  [6] round(::Type{T}, x::AbstractFloat, r::RoundingMode) where T<:Integer
     @ floatfuncs.jl:122
  [7] round(::Type{Dates.Date}, x::Union{Dates.Day, Dates.Week, Dates.TimePeriod, Dates.TimeType}, ::Type{P}) where P<:Dates.Period
     @ Dates ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Dates/src/rounding.jl:294
  [8] round(::Type{>:Missing}, ::Missing, ::RoundingMode)
     @ missing.jl:144
  [9] round(::Type{T}, x::Rational{Bool}, r::RoundingMode) where T>:Missing
     @ missing.jl:150
 [10] round(::Type{T}, ::Missing, ::RoundingMode) where T
     @ missing.jl:145
 [11] round(::Type{T}, x, r::RoundingMode) where T>:Missing
     @ missing.jl:147
 [12] round(::Type{T}, x::Rational{Bool}, ::RoundingMode) where T
     @ rational.jl:500
 [13] round(::Type{T}, x::Rational{Tr}, r::RoundingMode) where {T>:Missing, Tr}
     @ missing.jl:149
 [14] round(::Type{T}, x::Rational{Tr}, r::RoundingMode) where {T, Tr}
     @ rational.jl:493

gdalle commented 1 month ago

My reasoning was that if there's a round defined for the closest Float32 for instance (apparently there isn't?) then there are two ways to interpret this:

piecewise constant because the closest Float32 is almost surely not equal to the current Float64
linear equal to the identity because if we disregard finite precision, rounding to the closest float is basically the identity

See the duality?

adrhill commented 1 month ago

Yeah, a theoretical myround(Float32, x::Float64) would be a weird edge case. But even there, my point applies: we don't have the manpower to overload operators on a per-method basis. We should therefore usually just pick the most conservative type of differentiability over the set of all methods.

adrhill / SparseConnectivityTracer.jl

Zero-derivative functions on dual numbers should only return the primal #167