MasonProtter / Bumper.jl

Bring Your Own Stack
MIT License
146 stars 5 forks source link

Integration tests for DynamicExpressions.jl? #36

Open MilesCranmer opened 5 months ago

MilesCranmer commented 5 months ago

I read in the README:

If you use Bumper.jl, please consider submitting a sample of your use-case so I can include it in the test suite.

Happy to share that I just added support for Bumper.jl in DynamicExpressions.jl, which means people can soon also use it for SymbolicRegression.jl and PySR.

My use-case is coded up in this file with the important part being:

function bumper_eval_tree_array(
    tree::AbstractExpressionNode{T},
    cX::AbstractMatrix{T},
    operators::OperatorEnum,
    ::Val{turbo},
) where {T,turbo}
    result = similar(cX, axes(cX, 2))
    n = size(cX, 2)
    all_ok = Ref(false)
    @no_escape begin
        _result_ok = tree_mapreduce(
            # Leaf nodes, we create an allocation and fill
            # it with the value of the leaf:
            leaf_node -> begin
                ar = @alloc(T, n)
                ok = if leaf_node.constant
                    v = leaf_node.val::T
                    ar .= v
                    isfinite(v)
                else
                    ar .= view(cX, leaf_node.feature, :)
                    true
                end
                ResultOk(ar, ok)
            end,
            # Branch nodes, we simply pass them to the evaluation kernel:
            branch_node -> branch_node,
            # In the evaluation kernel, we combine the branch nodes
            # with the arrays created by the leaf nodes:
            ((args::Vararg{Any,M}) where {M}) ->
                dispatch_kerns!(operators, args..., Val(turbo)),
            tree;
            break_sharing=Val(true),
        )
        x = _result_ok.x
        result .= x
        all_ok[] = _result_ok.ok
    end
    return (result, all_ok[])
end

Basically it's a recursive evaluation scheme for an arbitrary symbolic expression over a 2D array of data. Preliminary result show a massive performance gain with bump allocation! Even faster than LoopVectorization (though the user could even turn on both, though I don't see much more of an improvement).

The way you can write an integration test is:

using DynamicExpressions: Node, OperatorEnum, eval_tree_array
using Bumper
using Random: MersenneTwister as RNG

operators = OperatorEnum(binary_operators=(+, -, *), unary_operators=(cos, exp))

x1 = Node{Float32}(feature=1)
x2 = Node{Float32}(feature=2)

tree = cos(x1 * 0.9 - 0.5) + x2 * exp(1.0 - x3 * x3)
# ^ This is a symbolic expression described as a type-stable binary tree

# Evaluate with Bumper:
X = randn(RNG(0), Float32, 2, 1000);

truth, no_nans_truth = eval_tree_array(tree, X, operators)
test, no_nans_test = eval_tree_array(tree, X, operators; bumper=true)

@test truth ≈ test

You could also random generate expressions if you want to use this as a way to stress test the bump allocator. The code to generate trees is here

which lets you do

tree = gen_random_tree_fixed_size(20, operators, 2, Float32)

Cheers, Miles

P.S., any tips on how I'm using bumper allocation would be much appreciated!! For example, I do know exactly how large the allocation should be in advance – can that help me get more perf at all?

MilesCranmer commented 5 months ago

Here's the benchmarks, btw: https://github.com/SymbolicML/DynamicExpressions.jl/pull/65#issuecomment-1925488799

Screenshot 2024-02-03 at 23 57 27

So pretty good performance over regular allocations with @inbounds @simd, and also decent performance gain over regular allocations + @turbo.