cavalab / srbench

A living benchmark framework for symbolic regression
https://cavalab.org/srbench/
GNU General Public License v3.0
203 stars 74 forks source link

How to compare the ground-truth expressions with the sympy string #112

Closed LuoYuanzhen closed 2 years ago

LuoYuanzhen commented 2 years ago

Hi,

Recently I am implementing my SR algorithm for competition, everything is fine but I still have a question for this competition detail eventhough I have read the Competition Guidlines.

Say a gound-truth model is -1.6+x**2 and my model produces a string log(|-0.2|)+x**2, then according to the "regressor guide" in Competition Guidelines, I should change this string to the sympy compatible string (here is removing '|' since sympy doesn't recognize it, so did in "submission/feat-example/regressor.py"): sympy_str = est.model_.replace("|", "")

When I do this, the model string would be converted to log(-0.2)+x**2, this is definely not the same expression as my model since if I run f = sympy.symplify("log(-0.2)+x**2"), then sympy_str would become "x*2 - 1.6094379124341 + Ipi".

So this result is totally different from the gound-truth model -1.6+x**2. My question is: will "symplify" be used for model comparison during the competition? If used, how should I deal with this problem?

folivetti commented 2 years ago

Hi @LuoYuanzhen in this example you should replace |-0.2| with abs(-0.2) so that you'll have the correct expression. Notice that Sympy is compatible with most (if not all) math functions in Python's math library.

marcovirgolin commented 2 years ago

Actually, if I am not mistaken, we will still be using the original estimator for predictions. The sympified version will be used for computing metrics such as complexity. As such, you do not need to worry about implementing the conversion so that protection is preserved (this because people implement protections in different ways).

Please @folivetti and @lacava can you confirm if what I recall is correct?

folivetti commented 2 years ago

For predictions we'll use the predict method provided by the corresponding regressor class. The sympy compatible expression will be used to measure complexity and whether it corresponds to the ground-truth, and we will use the simplify method to do that.

In that particular case where we have the log of a negative constant, you really should replace and simplify so that it will evaluate to the correctly value. If you meant to translate the protected operator, I think it should be fine either way. I have tested the following with sympy:

x = sympy.Symbol('x', real=True, positive=True)
log(Abs(x))
> log(x)

so if we state that the variable domain is positive, it will automatically remove the abs function.

LuoYuanzhen commented 2 years ago

Glad to get your reply so quickly. Thanks for all your reply. Now I am totally understand!