Quick start help needed - surely useful for other new users

cavalab / srbench

A living benchmark framework for symbolic regression

https://cavalab.org/srbench/

GNU General Public License v3.0

203 stars 74 forks source link

Quick start help needed - surely useful for other new users #139

Closed fnpdaml closed 1 year ago

fnpdaml commented 1 year ago

Hi there, Some quick start help needed:

After installing, how to run the benchmarks on user-supplied data? I'm struggling getting this work - to make sure there's nothing wrong with my SRBench install:

Could please indicate some simple steps on how to achieve this?

Many thanks!

lacava commented 1 year ago

hi @fnpdaml :

to use your own datasets, you want to check out / modify read_file in read_file.py: https://github.com/cavalab/srbench/blob/4cc90adc9c450dad3cb3f82c93136bc2cb3b1a0a/experiment/read_file.py
if your datasets follow the convention of https://github.com/EpistasisLab/pmlb/tree/master/datasets, i.e. they are in a pandas dataframe with the target column labelled "targert", you can call read_file directly just passing the filename like you would with any of the PMLB datasets.
read_file is called in evaluate_model here: https://github.com/cavalab/srbench/blob/4cc90adc9c450dad3cb3f82c93136bc2cb3b1a0a/experiment/evaluate_model.py#L39

hope that helps

fc59283 commented 1 year ago

Hi and thanks for that again! (I also use the above account interchangeably)

I had to do 3 things:

Indeed generate my data in a pandas dataframe with the target column labelled "target"; [I'm fitting a groundtruth: "eq1.tsv"]
Then it seemed easier just to mimic the PMLB layout - created the corresponding "metadata.yaml" and "summary_stats.tsv" files;
Compress my data to "eq1.tsv.gv" - and only now it worked.

But a few issues with "analyze.py":

"--time_limit" seems to have no control on the time it is allowed to run. (set 5min, had to abort after 1 day)

My data was a 2nd order equation and was not discovered. However, several methods seemed to converge on the same, but different (from the original) answer - is there normalization going on, and where is this controlled and recorded?

Many thanks!

fc59283 commented 1 year ago

(bump)

Any ideas about these questions?

lacava commented 1 year ago

time limit is only supported for cluster jobs (slurm or LSF clusters). in last year's competition we updated methods to have runtime limits, but this hasn't made its way to the master branch yet.
scaling is a parameter of evaluate_model: https://github.com/cavalab/srbench/blob/47da695292938d5e696ddcd4252f4034330ef787/experiment/evaluate_model.py#L24