Closed yoshitomo-matsubara closed 2 years ago
can you confirm you ran 'git lfs fetch' in the pmlb repo? looks like they may be git lfs references still. i need to update the instructions as well since feynman and strogatz datasets are now in master in pmlb
Hi @lacava Thank you for the response.
Yes, I did run git lfs fetch
for feynman
branch. A few minutes ago, I also fetched master
branch, but the downloaded tsv.gz files still look the same and are not in gzip format. (returned the same error as shown above).
I think we need git lfs pull
instead of git lfs fetch
. It seems the analyze.py
is now working with the downloaded datasets.
glad you found a solution. i believe git lfs pull
additionally checks out the branch but fetch will pull the files. I'll update the instructions for the main PMLB branch asap.
Thank you for updating the repo! I'll close this issue
@lacava It looks like the feynman datasets in PMLB are still incomplete.
metadata.yaml
files in strogatz datasets look complete, and analyze.py
works with the datasets.
However, the metadata.yaml
in feynman datasets are incomplete (description = 'None yet. See our contributing guide to help us add one.'), thus failed to get model_str
(equations?) and analyze.py
failed as follows
========================================
Evaluating tuned.FEATRegressor on
/opt/pmlb/datasets/feynman_III_10_19/feynman_III_10_19.tsv.gz
========================================
compression: gzip
filename: /opt/pmlb/datasets/feynman_III_10_19/feynman_III_10_19.tsv.gz
Traceback (most recent call last):
File "evaluate_model.py", line 291, in <module>
**eval_kwargs)
File "evaluate_model.py", line 41, in evaluate_model
true_model = get_sym_model(dataset)
File "/opt/app/srbench/experiment/symbolic_utils.py", line 239, in get_sym_model
model_str = [ms for ms in description if '=' in ms][0].split('=')[-1]
IndexError: list index out of range
thanks for checking. hm, some of the changes didn't make it into master... i'll look into it.
issued a PR on PMLB to resolve: https://github.com/EpistasisLab/pmlb/pull/158 will check back once it is merged into master.
@lacava Thank you for the update! Let me know here once it's merged into master
Hi, I am trying this out myself now, and getting an error with all Strogatz problems this time (Feynman's run fine).
Namely, when using the python analyze.py -script assess_symbolic_model
as indicated in the README, I get errors like the one shown below:
========================================
Assessing tuned.GPGOMEARegressor model for
../../pmlb/datasets/strogatz_predprey2/strogatz_predprey2.tsv.gz
========================================
looking for: ../results_sym_data/strogatz_predprey2//strogatz_predprey2_tuned.GPGOMEARegressor_860.json
['This is one state of a 2-state dynamic model for predator-prey populations. ', '', '$\\dot{x} = x \\cdot \\left( 4 - x - \\frac{y}{1+x} \\right)$', '$\\dot{y} = y \\cdot \\left( \\frac{x}{1+x} - 0.075 \\cdot y \\right)$', '', 'It is adapted from Steven Strogatz\'s book "Chaos and Nonlinear Dynamics". ', 'Each strogatz ODE system can exhibit chaotic and/or nonlinear behavior. ', 'For the purposes of modeling, these systems are simulated using initial conditions within stable basins of attraction. ', 'The systems are simulated using simulink and matlab. ', '']
ValueError: Error from parse_expr with transformed code: "x \\Symbol ('cdot' ) \\Function ('left' )(Integer (4 )-x - \\frac {y }{Integer (1 )+x } \\Symbol ('right' ))$"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "assess_symbolic_model.py", line 158, in <module>
feature_noise=args.X_NOISE)
File "assess_symbolic_model.py", line 111, in assess_symbolic_model
assess_symbolic_model_from_file(save_file+'.json', dataset)
File "assess_symbolic_model.py", line 42, in assess_symbolic_model_from_file
true_model = get_sym_model(dataset, return_str=False)
File "/export/scratch1/home/virgolin/srbench/experiment/symbolic_utils.py", line 246, in get_sym_model
local_dict = {k:Symbol(k) for k in features})
File "/export/scratch1/home/virgolin/anaconda3/envs/srbench/lib/python3.7/site-packages/sympy/parsing/sympy_parser.py", line 1026, in parse_expr
raise e from ValueError(f"Error from parse_expr with transformed code: {code!r}")
File "/export/scratch1/home/virgolin/anaconda3/envs/srbench/lib/python3.7/site-packages/sympy/parsing/sympy_parser.py", line 1017, in parse_expr
rv = eval_expr(code, local_dict, global_dict)
File "/export/scratch1/home/virgolin/anaconda3/envs/srbench/lib/python3.7/site-packages/sympy/parsing/sympy_parser.py", line 912, in eval_expr
code, global_dict, local_dict) # take local objects in preference
File "<string>", line 1
x \Symbol ('cdot' ) \Function ('left' )(Integer (4 )-x - \frac {y }{Integer (1 )+x } \Symbol ('right' ))$
^
SyntaxError: unexpected character after line continuation character
python analyze.py \
-script assess_symbolic_model \{'INPUT_FILE': '../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz', 'ALG': 'tuned.GPGOMEARegressor', 'RDIR': '../results_sym_data/strogatz_shearflow2/', 'RANDOM_STATE': 860, 'TEST': False, 'Y_NOISE': 0.0, 'X_NOISE': 0.0, 'SYM_DATA': True, 'JSON_FILE': ''}
I do see that the "true_model" field in the .json results for Strogatz includes a trailing $ at the end.
Perhaps it suffices to add a
model_str = model_str.replace("$","")
in symbolic_utils.get_sym_model?
I'd do a PR but I am not sure whether this is (somehow) a problem only I got, since I see nobody else raising it.
EDIT: removing the $ is not enough
hi @marcovirgolin
you caught a set of changes I hadn't pushed into PMLB.
once the checks complete on https://github.com/EpistasisLab/pmlb/pull/160, you can update from the pmlb master branch. for now you can checkout the strogatz_metadata branch. it seems to work for me on your example:
srbench/experiment$ python assess_symbolic_model.py ../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz -ml tuned.GPGOMEARegressor -results ../../analysis/results_sym_data_new/strogatz_shearflow2/ -seed 860
{'INPUT_FILE': '../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz', 'ALG': 'tuned.GPGOMEARegressor', 'RDIR': '../../analysis/results_sym_data_new/strogatz_shearflow2/', 'RANDOM_STATE': 860, 'TEST': False, 'Y_NOISE': 0.0, 'X_NOISE': 0.0, 'SYM_DATA': False, 'JSON_FILE': ''}
========================================
Assessing tuned.GPGOMEARegressor model for
../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz
========================================
looking for: ../../analysis/results_sym_data_new/strogatz_shearflow2//strogatz_shearflow2_tuned.GPGOMEARegressor_860.json
> /mnt/d/projects/symbolic-regression/srbench/experiment/symbolic_utils.py(244)get_sym_model()
-> model_sym = parse_expr(model_str,
(Pdb) c
compression: gzip
filename: ../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz
replacing feature 0 with x
replacing feature 1 with y
parsing 0.000170+2.307729*(((((cos(sin(y))*PLOG(PLOG(14.465000)))*cos((cos(y)/(-11.097000--13.964000))))+cos(-20.929000))*sin(x)))
{'x': x, 'y': y, 'add': <class 'sympy.core.add.Add'>, 'mul': <class 'sympy.core.mul.Mul'>, 'max': Max, 'min': Min, 'sub': <function sub at 0x7f4e8ed32790>, 'div': <function div at 0x7f4e8d7e2040>, 'square': <function square at 0x7f4e8d7e20d0>, 'cube': <function cube at 0x7f4e8d7e2160>, 'quart': <function quart at 0x7f4e8d7e21f0>, 'PLOG': <function PLOG at 0x7f4e8d7e2280>, 'PLOG10': <function PLOG at 0x7f4e8d7e2280>, 'PSQRT': <function PSQRT at 0x7f4e8d7e23a0>}
round_floats
rounded: 2.31*(0.983*cos(sin(y))*cos(0.349*cos(y)) - 0.487)*sin(x)
simplify...
simplified: (2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)*sin(x)
saving...
sym_diff: -(2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)*sin(x) + (0.1*sin(y)**2 + cos(y)**2)*sin(x)
sym_frac: (2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)/(0.1*sin(y)**2 + cos(y)**2)
simplified sym_diff: (-0.9*sin(y)**2 - 2.27*cos(sin(y))*cos(0.349*cos(y)) + 2.12)*sin(x)
{
"dataset": "strogatz_shearflow2",
"algorithm": "tuned.GPGOMEARegressor",
"params": {
"caching": false,
"classweights": false,
"elitism": 1,
"erc": true,
"evaluations": 1000000,
"functions": "+_-_*_p/_plog_sqrt_sin_cos",
"generations": -1,
"gomea": true,
"gomfos": "LT",
"ims": false,
"initmaxtreeheight": 6,
"linearscaling": true,
"maxsize": 1000,
"maxtreeheight": 17,
"parallel": false,
"popsize": 1000,
"prob": "symbreg",
"reproduction": 0.0,
"sbagx": 0.0,
"sblibtype": false,
"sbrdo": 0.0,
"seed": -1,
"silent": true,
"subcross": 0.5,
"submut": 0.5,
"syntuniqinit": 1000,
"time": 28800,
"tournament": 4,
"unifdepthvar": true
},
"random_state": 860,
"process_time": 133.882689869,
"time_time": 133.97960495948792,
"target_noise": 0.0,
"feature_noise": 0.0,
"true_model": "(0.1*sin(y)**2 + cos(y)**2)*sin(x)",
"model_size": 21,
"symbolic_model": "0.000170+2.307729*(((((cos(sin(x1))*plog(plog(14.465000)))*cos((cos(x1)p/(-11.097000--13.964000))))+cos(-20.929000))*sin(x0)))",
"mse_train": 1.2889293272279269e-06,
"mae_train": 0.0008988973869460174,
"r2_train": 0.9999751463811574,
"mse_test": 1.3140537910085879e-06,
"mae_test": 0.0009068998433162603,
"r2_test": 0.9999816769614173,
"simplified_symbolic_model": "(2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)*sin(x)",
"simplified_complexity": 15,
"symbolic_error": "(-0.9*sin(y)**2 - 2.27*cos(sin(y))*cos(0.349*cos(y)) + 2.12)*sin(x)",
"symbolic_fraction": "(2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)/(0.1*sin(y)**2 + cos(y)**2)",
"symbolic_error_is_zero": false,
"symbolic_error_is_constant": false,
"symbolic_fraction_is_constant": false
}
saving...
done.
https://github.com/EpistasisLab/pmlb/pull/160 was merged. update PMLB from git and you should be good to go.
Hi!
Thank you for your great work and framework! I wanted to try the benchmarked methods for the ground-truth datasets (i.e., Feynman and Strogatz datasets) and followed the instructions in README.
Is each of the datasets not in gzip format?
However, the datasets fetched from the pmlb repository look broken. Here is one of the errors I got when running
python analyze.py -results ../results_sym_data -target_noise 0.0 "/data/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local
for Strogatz dataset. (Same errors occurred for Feynman dataset by "/data/pmlb/datasets/feynman_*" as well)
I also tried to manually gunzip the file, but the error message still says it's not in gzip format
Could you please resolve this issue for both Feynman and Strogatz datasets? Thank you!