Open MilesCranmer opened 8 months ago
Nice observation. May be tricky to fix quickly. I'm curious how you came to identify the issue. Do you have a concrete use case with performance substantially compromised by this issue?
Sure, here is a 50x slowdown with a really simple function. But keep in mind that this is just the most minimal example. Ultimately this type instability will infect any code it touches. The larger the codebase depending on a particular predict
call, the worse the performance hit.
using SymbolicRegression, MLJBase, Random
rng = Random.MersenneTwister(0)
X = randn(rng, 30, 3)
y = @. cos(X[:, 1] * 2.1 - 0.9) * X[:, 2] - X[:, 3]
model = SRRegressor(deterministic=true, seed=0, parallelism=:serial, maxsize=10)
mach = machine(model, X, y)
fit!(mach)
# With predict(mach, X)
function f(mach, X)
y = predict(mach, X)
return mean(y), std(y)
end
# With low-level calling
function g(eqn, options, X)
y, _ = eval_tree_array(eqn, X, options)
return mean(y), std(y)
end
With the following performance comparison (Julia 1.10.2)
@btime f($mach, $X)
# 5.521 μs (259 allocations: 18.79 KiB)
@btime g(eqn, options, $(copy(X'))) setup=(r=report(mach); eqn = r.equations[r.best_idx]; options=mach.fitresult.options)
# 134.497 ns (3 allocations: 368 bytes)
(I highly recommend Cthulhu.jl for narrowing down type instabilities)
Here is the Cthulhu.jl output when I descend into f(mach, X)
– the predict(mach, X)
call results in ::Any
output (in red):
Descending into the predict
call, we can immediately see that the first appearance of an ::Any
is from accessing the .fitresult
:
Then every call after that is infected by the red (type inference issue).
Perhaps one way to fix this without deep structural changes would be to allow package developers to specify the fitresult
type in a trait handled by MLJModelInterface. Then, when calling predict
, or other operators, you can do a type assertion, allowing the function to specialize.
For example:
MMI.fitresult_type(::Type{<:MyRegressor{A,B,C}}) where {A,B,C} = MyFitResultType{A,B,C}
and simply have the default method be
MMI.fitresult_type(_) = Any
then, in predict
, you could do the following:
ret = $(operator)(
model,
mach.fitresult::MMI.fitresult_type(model),
...
)
This would be completely backwards compatible, because the standard fitresult_type(_) = Any
.
Using this type assertion would actually be enough for Julia to specialize.
What do you think?
Edit: looks like there is also no specialization to the array type passed to Machine
. i.e., the Machine
does not actually store info about the array in the type. Could that be added also?
Thanks @MilesCranmer for the further details and the very nice idea for a fix. Perhaps after I have a chance to look at a direct fix we can go this route.
Okay, the type instability in last_model
is an easy fix. I just need to annotate the old_model
field in the Machine
struct. But the benchmarks are still bad. Tried Chulhu.jl
as kindly suggested, but can't get it to work just now.
I had that issue too. I put the following in my startup.jl: https://github.com/JuliaDebug/Cthulhu.jl/pull/546#issue-2140650745
Actually it looks like it’s fixed on master version of Cthulhu, just not released yet. So just install master.
Also, this tutorial from Tim Holy was extremely helpful for learning this type of analysis: https://youtube.com/watch?v=pvduxLowpPY Completely changed how I profile julia code
I have made some more progress removing some type instability with predict. Will cross-reference to a PR shorty.
However, while this reduces bloat in predict(::Machine, ...)
across the board, it does not change the disparity reported for SymbolicRegression reported above. However, I wonder if the comparison above is really fair; if I replace the definition of g
with
function g(model, fitresult, X)
y = predict(model, fitresult, X)
return mean(y), std(y)
end
So that I am just comparing predict(::Machine, ...)
with predict(::Model, ...)
(implemented in SymbolicRegression.jl not MLJBase) then both benchmarks are similar, with the large number of allocations.
Furthermore, there appears to be a type instability in predict(::Model, ...)
, as currently implemented in SymbolicRegression.jl:
rng = Random.MersenneTwister(0)
X = randn(rng, 30, 3)
y = @. cos(X[:, 1] * 2.1 - 0.9) * X[:, 2] - X[:, 3]
model = SRRegressor(deterministic=true, seed=0, parallelism=:serial, maxsize=10)
mach = machine(model, X, y)
fit!(mach, verbosity=0)
fitresult = mach.fitresult;
After @code_warntype predict(model, fitresult, X)
:
Note the boldface Anys.
Am I missing something here?
Due to the definition of
last_model
being anAny
:all of the operations in
OPERATIONS
cannot infer output type. This is okay forreport
, but forpredict
, the output type might be important for downstream operations, and so this can slow things down.