SymbolicML / DynamicExpressions.jl

Ridiculously fast symbolic expressions
https://symbolicml.org/DynamicExpressions.jl/dev
Apache License 2.0
95 stars 13 forks source link

Overflow turns the whole batch to `NaN`s #87

Open nmheim opened 1 week ago

nmheim commented 1 week ago

Hey! Thanks a lot for this, I really like the package!:)

I seems like an overflow in one of the samples causes the whole batch to be turned into NaNs:

using DynamicExpressions

T = Float64
x = Node{T}(feature=1)
ops = OperatorEnum(binary_operators=[*])
expr = x*2

julia> X = ones(1,2)
julia> expr(X, ops)
2-element Vector{Float64}:
 2.0
 2.0

julia> X[2] = floatmax(T)
julia> expr(X, ops)
2-element Vector{Float64}:
 NaN
 NaN
MilesCranmer commented 1 week ago

Thanks, glad you are enjoying it!

Yes, this is expected. The evaluation returns early whenever any element of any step in the evaluation is NaN or Inf: https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/src/Evaluate.jl#L178 https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/src/Evaluate.jl#L22-L28 https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/src/Utils.jl#L15-L17. Thus, because the evaluation returned early, the buffer is not actually equal to the result of the expression. So the buffer is overwritten with NaNs to make this obvious.

This behavior is done with symbolic regression in mind, so that expressions with singularities don't waste more cycles than they have to. Within SymbolicRegression, the eval_tree_array call is used directly, so it can skip the step where NaNs are written to the buffer:

result, completed = eval_tree_array(tree, X, operators)
if !completed
    # evaluation quit early, so return infinite loss value
end
nmheim commented 1 week ago

Thanks for the quick reply! that makes sense. Is there a way to get NaNs only for the samples that actually fail other than looping over the batch? In any case, as this is expected behaviour I think this can be closed:)

MilesCranmer commented 1 week ago

At the moment, no, but it might be nice to have that behavior. If you are interested and have some time I can help point out what needs to be edited for this to work?

nmheim commented 1 week ago

That sounds great, I'd be happy to take a look at it if you give me some pointers!:)

MilesCranmer commented 1 week ago

I think eval_tree_array: https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/src/Evaluate.jl#L66-L72 needs to have a new parameter early_exit::Union{Bool,Val}=Val(true) similar to turbo. Then, @return_on_nonfinite_array https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/src/Evaluate.jl#L22-L28 needs to have another argument early_exit which basically toggles the behavior on/off. Similar for @return_on_check. Then, other checks like https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/src/Evaluate.jl#L87 need to be handled manually.

In other words, if early_exit is true, do the isfinite check. Otherwise, skip it. The early_exit should come first in any if statement so that compute is not wasted.

Note that most of the ::Val{turbo} in this file is just given as ::Val{false}. That is because Val{true} is implemented in a separate file: https://github.com/SymbolicML/DynamicExpressions.jl/blob/9e95f0538207c360874615686be5c0ec627aee42/ext/DynamicExpressionsLoopVectorizationExt.jl#L20-L28. (These functions are all the same except they use LoopVectorization.@turbo on the loops.)

For early_exit, everything can be implemented in the same file, since it will just turn on/off the isfinite checks.

With this implemented you would be able to call expr(X, ops; early_exit=Val(false)) which would let you avoid filling the NaNs in the array.

Sorry this was a bit convoluted – ask any questions needed!