SymbolicML / DynamicExpressions.jl

Ridiculously fast symbolic expressions
https://ai.damtp.cam.ac.uk/dynamicexpressions
Apache License 2.0
106 stars 15 forks source link

feat: create `Expression` type to store operators with expression and `parse_expression` to have robust parsing #73

Closed MilesCranmer closed 5 months ago

MilesCranmer commented 6 months ago

Expressions

This creates the Expression and AbstractExpression type for attaching operators and potentially other metadata to the expression itself.

This will likely be most useful as user-facing type to return self-contained expressions. I think it can also be useful for expression searches that attach other types of metadata.

This means you can do things like

expression(X)
# ^ Rather than `tree(X, operators)`

println(expression)
# ^ Rather than `print_tree(expression; operators, variable_names)`

without needing to provide the operators or variable names – they are stored inside the Expression struct.

Previously you needed to keep track of these yourself which is a bit of a pain if you aren't deeply knowledgeable about how everything works.

Parsing

In addition to this, I also add @parse_expression macro. This is a safer and easier alternative to operator overloading for creating Node types.

Basically it works as follows:

using DynamicExpressions

my_op(x) = cos(x^2)

operators = OperatorEnum(binary_operators=[+, -, *, /], unary_operators=[my_op])
variable_names = [:α, :β]  # Or strings

expr = @parse_expression(
    my_op(2.3 * α - β) / (0.5 * β * β),  # Pass an expression that will be parsed into an equivalent `Node`
    operators = operators,
    variable_names = variable_names,
)

X = randn(2, 100)

expr(X)
# ^ This is now valid, because `expr.operators` points to the operator enum.
# Also valid: `eval_tree_array(expr, X)`

println(expr)
# ^ Has the `α` and `β` despite everything being stored as the lightweight `Node` type!

expr'(X)
# ^ Forward-mode gradient with respect to inputs

You can also optionally provide a node_type argument to specify e.g., use of a GraphNode instead of a Node.

This is much nicer because it means you can have multiple sets of operator enums, and generate expressions without worrying about overwriting the global mapping from function to index.


TODO:


Tagging in case of interest @avik-pal @AlCap23 @rikhuijzer @Moelf

coveralls commented 6 months ago

Pull Request Test Coverage Report for Build 9182557303

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Expression.jl 88 89 98.88%
src/Parse.jl 107 111 96.4%
<!-- Total: 234 239 97.91% -->
Totals Coverage Status
Change from base Build 8876266483: 0.6%
Covered Lines: 1858
Relevant Lines: 1947

💛 - Coveralls
daloic commented 6 months ago

With my very limited Julia knowledge, this looks like solving the issues I have and probably allowing the end user to later easily create a backbone expression from a parsed formula to feed the search with a seed. I like it.

github-actions[bot] commented 6 months ago

Benchmark Results

master 703058e3396d7f... master/703058e3396d7f...
eval/ComplexF32/evaluation 7.43 ± 0.45 ms 7.46 ± 0.47 ms 0.995
eval/ComplexF64/evaluation 9.64 ± 0.7 ms 9.74 ± 0.67 ms 0.99
eval/Float32/derivative 10.8 ± 1.6 ms 10.7 ± 1.6 ms 1.01
eval/Float32/derivative_turbo 10.7 ± 1.5 ms 10.7 ± 1.5 ms 1
eval/Float32/evaluation 2.74 ± 0.22 ms 2.69 ± 0.22 ms 1.02
eval/Float32/evaluation_bumper 0.569 ± 0.013 ms 0.528 ± 0.013 ms 1.08
eval/Float32/evaluation_turbo 0.682 ± 0.024 ms 0.699 ± 0.034 ms 0.975
eval/Float32/evaluation_turbo_bumper 0.569 ± 0.013 ms 0.534 ± 0.013 ms 1.07
eval/Float64/derivative 13.9 ± 0.49 ms 13.7 ± 0.53 ms 1.02
eval/Float64/derivative_turbo 13.9 ± 0.57 ms 13.7 ± 0.53 ms 1.02
eval/Float64/evaluation 2.88 ± 0.24 ms 2.88 ± 0.23 ms 1
eval/Float64/evaluation_bumper 1.29 ± 0.043 ms 1.19 ± 0.044 ms 1.08
eval/Float64/evaluation_turbo 1.16 ± 0.058 ms 1.17 ± 0.063 ms 0.99
eval/Float64/evaluation_turbo_bumper 1.3 ± 0.044 ms 1.19 ± 0.043 ms 1.09
utils/combine_operators/break_sharing 0.0407 ± 0.0013 ms 0.0414 ± 0.0013 ms 0.982
utils/convert/break_sharing 28 ± 0.72 μs 27.9 ± 0.82 μs 1
utils/convert/preserve_sharing 0.128 ± 0.0023 ms 0.127 ± 0.0024 ms 1.01
utils/copy/break_sharing 29.5 ± 0.77 μs 28.9 ± 0.9 μs 1.02
utils/copy/preserve_sharing 0.127 ± 0.0023 ms 0.127 ± 0.0028 ms 0.996
utils/count_constants/break_sharing 12.7 ± 0.19 μs 10.4 ± 0.14 μs 1.23
utils/count_constants/preserve_sharing 0.111 ± 0.002 ms 0.11 ± 0.0022 ms 1.01
utils/count_depth/break_sharing 14.1 ± 0.25 μs 12.5 ± 0.19 μs 1.13
utils/count_nodes/break_sharing 12.9 ± 0.2 μs 9.88 ± 0.16 μs 1.3
utils/count_nodes/preserve_sharing 0.112 ± 0.002 ms 0.115 ± 0.0057 ms 0.972
utils/get_set_constants!/break_sharing 0.0528 ± 0.00078 ms 0.0525 ± 0.00086 ms 1.01
utils/get_set_constants!/preserve_sharing 0.316 ± 0.005 ms 0.319 ± 0.0057 ms 0.99
utils/has_constants/break_sharing 5.14 ± 0.29 μs 4.75 ± 0.22 μs 1.08
utils/has_operators/break_sharing 1.78 ± 0.016 μs 1.65 ± 0.019 μs 1.08
utils/hash/break_sharing 30.1 ± 0.43 μs 0.033 ± 0.00046 ms 0.91
utils/hash/preserve_sharing 0.131 ± 0.002 ms 0.135 ± 0.003 ms 0.968
utils/index_constants/break_sharing 27.3 ± 0.55 μs 27.4 ± 0.59 μs 0.997
utils/index_constants/preserve_sharing 0.126 ± 0.0024 ms 0.128 ± 0.0033 ms 0.983
utils/is_constant/break_sharing 5.73 ± 0.31 μs 4.68 ± 0.22 μs 1.23
utils/simplify_tree/break_sharing 0.181 ± 0.016 ms 0.154 ± 0.015 ms 1.17
utils/simplify_tree/preserve_sharing 0.293 ± 0.017 ms 0.277 ± 0.016 ms 1.06
utils/string_tree/break_sharing 0.487 ± 0.0095 ms 0.502 ± 0.016 ms 0.97
utils/string_tree/preserve_sharing 0.63 ± 0.012 ms 0.644 ± 0.016 ms 0.978
time_to_load 0.203 ± 0.0017 s 0.213 ± 0.0025 s 0.953
rikhuijzer commented 6 months ago

Tagging in case of interest @avik-pal @AlCap23 @rikhuijzer @Moelf

Thanks indeed for the reminder that I should try out this package. I do have some use-cases where it would be useful for

MilesCranmer commented 6 months ago

I'm also thinking about just ditching the @parse_expression altogether in favor of a parse_expression(::Expr, ...) function. It doesn't actually need to have a block of code as input so not sure there is need.