[Feature] TensorSymbolicRegression.jl

MilesCranmer commented 2 years ago

For example, say you would like to search for partial differential equations with genetic algorithms.

You would define operators such as finite difference gradients, laplacians, or summations–and then search for a tensor function Y = f(X), where for a 2D PDE, $(X, Y)\in\mathbb{R}^{b \times n \times m}$ ($b$ the batch dim, $n$ and $m$ the sides of a box, for example).

e.g.,

grad_along_x(x::AbstractArray{T,3}) where {T<:Real} = roll(x, -1, dims=2) .- roll(x, 1, dims=2)

would define $f(\phi) = \partial\phi/\partial x$ for $\phi$ a rank 3 tensor.

I think it would be easy enough to convert all AbstractMatrix and AbstractVector into AbstractArray to enable this.

Following this, one would want to allow passing arbitrary vector or matrix operators, instead of just scalar operators as is done now (which are vectorized to be rank 1 tensor -> rank 1 tensor). The user would define each operator over combinations of Matrix/Vector/Scalars. Then, exploiting Julia's multiple dispatch, the evaluation code would automatically call the correct function given the inputs.

Here's the trick: If a given set of input types is not defined (say you try to pass a scalar to grad_along_x, then it should simply fail the evaluation and return. The fitness of that expression will be poor, and the genetic algorithm will take care of the rest–no further modifications needed.

@ChrisRackauckas, @kazewong, @PatrickKidger, @AlCap23 do you know anybody who would be interested in working on this? I think it wouldn't be too much work to set up.

(Anybody else who reads this - feel free to post here if you are interested)

MilesCranmer commented 2 years ago

@kazewong - are you interested in working on this?

MilesCranmer commented 2 years ago

Also see #90 for more ideas how to do this. I think you would basically want to let an operator be equal to the identity function if its not defined for a given type (or, plus). Then, every number of mutations, you would try to remote identity operators.

MilesCranmer commented 2 years ago

More ideas:

Rather than passing Val(tree.op), you would pass Val(tree.op), INPUT_TYPE, OUTPUT_TYPE to the evaluation kernel. It would therefore know exactly what the input and output type should be, and could specialize!
How would one find OUTPUT_TYPE? Well, this would simply be the INPUT_TYPE of the parent node! So, essentially, the root note gets passed in first, and would pass its desired OUTPUT_TYPE to each child node. That child node would then figure out what INPUT_TYPE it needs to create that OUTPUT_TYPE. If such a pair does not exist for that operator (e.g., dot does not have an (INPUT_TYPE, OUTPUT_TYPE) = ((Vector{T}, Vector{T}), Vector{T}); it only has ((Vector{T}, Vector{T}), T)), then the evaluation would fail and an uninitialized array (of the correct type) returned instead. Otherwise, it would call the corresponding kernel.
All available types for each operators would be compile-time constants (stored in a tuple), so this should not have any performance issues due to type instability.

closedLoop commented 3 months ago

I'm also interesting in taking a crack at this

MilesCranmer / SymbolicRegression.jl

[Feature] TensorSymbolicRegression.jl #107