MilesCranmer / SymbolicRegression.jl

Distributed High-Performance Symbolic Regression in Julia
https://astroautomata.com/SymbolicRegression.jl/dev/
Apache License 2.0
584 stars 74 forks source link

Initializing with pre-specified population #342

Open charishma13 opened 2 weeks ago

charishma13 commented 2 weeks ago

I would like to know how to initialize my population with n members which have pre-specified structure. For example, if i want my initiate population to have 15 members all of which have same expression eg: 1+x. Are there Pysr options to do it or is it something need to be updated. Thank you.

MilesCranmer commented 2 weeks ago

This feature does not yet exist, but it would certainly be nice to add it or simplify existing alternatives. The current strategy is basically to initialise the state manually. Alternatively you could run a search for 1 iteration, and then manipulate the saved state to specify individual members of the population. On the PySR discussions page there are some threads about this too.

charishma13 commented 2 weeks ago

Thank you for the suggestion @MilesCranmer. I will check the documentation and do the respective changes. I would also like to know in which Julia file does the actual initialization of population happens for every PySR iteration ?

MilesCranmer commented 2 weeks ago

The initialisation function is here: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/master/src/Population.jl#L36-L62

which gets called here: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/cd23a6e25c64d00565c3ae3905d06dc3c63033ed/src/SymbolicRegression.jl#L775

charishma13 commented 1 week ago

I am currently facing challenges in creating a custom saved_state. The saved_state is a tuple consisting of a population and a hall of fame object. I am in the process of developing a custom implementation for both the population and the hall of fame. To date, I have successfully created the PopMember component, following the guidance provided in the discussion available at https://github.com/MilesCranmer/PySR/discussions/443. I am attempting to create a population using PopMember instances, and I was considering calling the struct directly for this purpose. However, I am unsure if this approach will work as intended. I am encountering errors with the following code in highlighted line.

using .SymbolicRegression: Node, Options, equation_search, Dataset, PopMember, HallOfFame, Population
using CSV
using DataFrames

val = Node{Float64}(val=162.0)
xsi = Node{Float64}(val=1.224f0)

options = Options(binary_operators=[+, -, *, /])

csv_file_path = "water_water.csv"
data = CSV.File(csv_file_path) |> DataFrame

X1 = reshape(data."Angle", 1, :)
X2 = reshape(data."OH1", 1, :)
X3 = reshape(data."OH2", 1, :)
X4 = reshape(data."H1H2", 1, :)
X = [X1 X2 X3 X4]

X = reshape(X, 4, :)
y = data."Energy"

# Assuming y is your target variable
y_min = minimum(y)
y_scaled = (y .- y_min) * 2625.5002

dataset = Dataset(X, y_scaled)

# Format to PopMember:
member = PopMember(dataset, val, options; deterministic=false)
member1 = PopMember(dataset, xsi, options; deterministic=false)

>> population = Population{Float32, Float64, Node{Float32}}([member, member1], 2)

ERROR

ERROR: LoadError: TypeError: in Population, in L, expected L<:Real, got a value of type Float64 Stacktrace: [1] top-level scope @ ~/LU_Exp/popmembers_hof.jl:77

charishma13 commented 2 days ago

Hello @MilesCranmer,

I have managed to populate the Population using the following code: Population{Float32, Float32, Node{Float32}}([member, member1], 2)

I would like to inquire about where the initialization begins within the SymbolicRegression.jl framework, particularly with respect to functions such as _main_search_loop, _warmup_search, _initialize_search, and _create_workers.

Our intention is to modify the process starting from the initial population phase, allowing PySR to search for equations based on a predefined expression given. Could you kindly clarify which function is responsible for calling the Population struct and initiating its initialization?. Additionally, is it possible to adjust the complexity, such that the search begins with a higher value, for instance, 7 or 9, instead of the default starting point of 1 (a float value)?