MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://astroautomata.com/PySR
Apache License 2.0
2.34k stars 212 forks source link

[Feature] 1D integration mode #164

Open MilesCranmer opened 2 years ago

MilesCranmer commented 2 years ago

Since the PySR backend now has fast autodiff set up, it seems like the obvious next step is to set up PySR to solve 1D integrals via genetic algorithms. I think the only change would be the loss function - which would use eval_diff_tree_array.

Perhaps it would make sense to create a separate class with a limited number of features, to host this functionality.

foxtran commented 9 months ago

Hello!

Am I right that this issue is about something like y = \int f(X(t)) dt where f(..) is some regression?

MilesCranmer commented 9 months ago

Yep! Exactly.

foxtran commented 9 months ago

I think the only change would be the loss function - which would use eval_diff_tree_array.

From my point of view, it requires to update equation_search in SymbolicRegression.jl to accept X::AbstractVector{AbstractMatrix{T}} and Y::AbstractVector{AbstractMatrix{T}} with corresponding weights::AbstractVector{AbstractMatrix{T}} for numerical integration...

MilesCranmer commented 9 months ago

Oh maybe you mean something different. The idea I had in mind can already be done with the current equation_search by passing in a custom loss function and flattened X array which is then unpacked by the custom loss function. For example, see the implementation in this thread: https://github.com/MilesCranmer/PySR/discussions/374. The idea for a 1D integration "mode" is to basically set this up automatically, somehow. Does that make more sense?

foxtran commented 9 months ago

Oh maybe you mean something different.

Probably... It is just from specific of my dataset in which I have a set of integrals which has different number of points.

At final, with custom loss function, I did something like this:

objective = #"""
using DataFrames

function my_custom_objective(tree, dataset::Dataset{T,L}, options) where {T,L}

    t = string(tree);
    if occursin("x1", t) || occursin("x2", t)
      return L(Inf)
    end

    point_pred, flag = eval_tree_array(tree, dataset.X, options)
    !flag && return L(Inf)

    diffs = dataset.X[2,:] .* (point_pred .- dataset.y)

    df = DataFrame(ID = dataset.X[1,:], D = diffs)
    res_df = combine(groupby(df, :ID), :D => sum)
    diffs = res_df[!, :D_sum]

    return sum(diffs .^ 2) / length(diffs)
end
#"""

In first two columns, I encoded ID of integrals and weights for numerical integration and I made an integration in such strange way via DafaFrames :) Unfortunately, batching does not work there.