SymbolicML / DynamicExpressions.jl

Ridiculously fast symbolic expressions
https://ai.damtp.cam.ac.uk/dynamicexpressions
Apache License 2.0
106 stars 15 forks source link

Graph evaluator caching and expression visualiser #98

Closed robdancer closed 3 months ago

robdancer commented 3 months ago

Two main features - they could be split and PR'd separately if that's preferable

Graph evaluator caching

The GraphNode struct now has an additional property, the AbstractArray cache, not initialised during object creation. This enables the result of each node evaluation to be stored and retrieved without the need for a dictionary lookup.

A GraphNode can now also have the constant property set to true during evaluation if it is an operator node with constant operand nodes. In such a case the val property will also be set to the result of the single operation required. This behaviour won't occur in sufficiently optimised expressions, as a subexpression consisting of operators with constant operand nodes could be replaced with a single constant node. This doesn't meaningfully mutate the expression during the evaluation, as the constant property is otherwise meaningless for nodes with degree >0.

In terms of the evaluator itself, there are two implemented versions: one using Base.map, and one using LoopVectorization.vmapnt which is used when the LoopVectorization package is loaded. Some rough tests I did found that, on a graph with no shared subexpressions, the graph evaluator takes 1.4x as long as the corresponding (with or without LoopVectorization) tree evaluator. I might do some more work on reducing that, one method would be to quickly identify whether shared subexpressions exist in the expression and choose a corresponding evaluator. The relative performance of the graph evaluator operating on an expression with shared subexpressions is entirely dependent on the number and size of the shared subexpressions but has a theoretically unbounded performance improvement.

Expression visualiser

Expressions using many repeated shared subexpressions can produce long string representations, making it difficult to understand the graph structure of the expression. An example of the visualiser:

using Plots, GraphRecipes
using DynamicExpressions

operators = OperatorEnum(;
    binary_operators=[+, -, *, /], unary_operators=[cos, sin]
);

g0 = DynamicExpressions.GraphNode(feature=1)
g1 = sin(g0)
g2 = g1 + 2.5
g3 = g1 / g2

visualize(g3, operators)

Produces the output: output

Red node represents the root, and the labels on each arrow signify the number of the operand.

I've also updated SymbolicRegression.jl that improves the form_random_connection! mutation but it relies on the new randomised_topological_sort function added to DynamicExpressions in this PR.

Would be very happy to hear any and all feedback!

MilesCranmer commented 3 months ago

Thanks! Yeah could this be split into two PRs? Usually it's fine to have both in one, but these features cover quite different areas so are probably best split.

MilesCranmer commented 3 months ago

I wonder if we could depend on https://github.com/JuliaGraphs/Graphs.jl? If we set up all of the right methods in a DynamicExpressionsGraphsExt, we should get the plotting "for free" – https://juliagraphs.org/Graphs.jl/stable/first_steps/plotting/ and not need our own custom plotting function.

Although will need https://github.com/JuliaGraphs/Graphs.jl/issues/395

robdancer commented 3 months ago

Thanks! Yeah could this be split into two PRs? Usually it's fine to have both in one, but these features cover quite different areas so are probably best split.

Agreed - two PRs is best. Let me close this pull request and I'll set up two new ones.