Change GSGP code and interface

cavalab / srbench

A living benchmark framework for symbolic regression

https://cavalab.org/srbench/

GNU General Public License v3.0

216 stars 77 forks source link

Change GSGP code and interface #52

Closed aboisbunon closed 2 years ago

aboisbunon commented 3 years ago

Hello! Here is a proposition to improve GSGP's C code and Python interface with the following modifications:

move the training part to the fit function in gsgp.py
handle a user-specified seed
get a bit more info if the program fails.

I ran a few tests which went well, I hope you'll find it interesting for the benchmark. BTW, I've seen GSGP has been removed in the new benchmark article, is it because of the unavailability of an explicit model with this method? Best regards, AB

lacava commented 3 years ago

hi @adoisbunon, thanks for the contribution! sorry for the delay.

BTW, I've seen GSGP has been removed in the new benchmark article, is it because of the unavailability of an explicit model with this method?

We decided to leave GSGP out because 1) it was hacked together in our first experiment (e.g., it doesn't have a way to predict without passing labels or have good error reporting), and I did not want to take responsibility for maintaining the codebase; 2) it did not perform well in our first benchmark and produces large models; 3) the developers were not responsive to questions/requests.

I am happy to include it if you are willing to maintain it, but yes, we now require the methods to return a string version of the final model that can be assessed for symbolic equivalence. Is that possible?

aboisbunon commented 3 years ago

Hi @lacava , Thanks for taking the time to look at my contribution! To answer your questions about GSGP

We decided to leave GSGP out because 1) it was hacked together in our first experiment (e.g., it doesn't have a way to predict without passing labels or have good error reporting), and I did not want to take responsibility for maintaining the codebase; 2) it did not perform well in our first benchmark and produces large models; 3) the developers were not responsive to questions/requests.

here is my answer: 1) this is actually the contribution I proposed, so it shouldn't be an issue anymore; 2) I had the same experience, sometimes it works well, but more often it doesn't...

As for the final model, I don't think it is possible to retrieve it as a readable text formula as it is growing exponentially... I tried but my code crashed rather quickly! Appart from that, I'm not sure it will require a lot of maintenance after that, but yeah, I could do it if needed.

Best, AB

lacava commented 3 years ago

hi @aboisbunon , would you mind re-merging the upstream master? just fixed some versioning conflicts that popped up (#56)

aboisbunon commented 3 years ago

Hi @lacava , no problem, it's done. Let me know if you need anything else!

aboisbunon commented 2 years ago

Hello @lacava , Thank you for the answer. So I checked again, I'm able to compute the complexity of GSGP models (which in the example I tested reached 8e39, so I'm not sure it is really worth it...), but for the formula I really cannot find a way. The formula increases exponentially, resulting in a MemoryError, and using sympy to reduce the formula at each step or regularly does not help either (it does not crash, but it is still nowhere from over after 3hours...). If you know of another tool than sympy to simplify formulas, I would be happy to test it out.

I would suggest to put np.inf for the complexity (or the actual complexity), and np.nan for the formula. What do you think?

lacava commented 2 years ago

Hi @aboisbunon , unless you feel strongly about it, I suggest we go back to not including GSGP. If it produces infinite complexity and no formula, and in our previous tests underperforms linear methods, I don't think it's worth the hassle...

aboisbunon commented 2 years ago

Hi @lacava , no problem, I don't have strong feelings about it :)