Closed aboisbunon closed 2 years ago
hi @adoisbunon, thanks for the contribution! sorry for the delay.
BTW, I've seen GSGP has been removed in the new benchmark article, is it because of the unavailability of an explicit model with this method?
We decided to leave GSGP out because 1) it was hacked together in our first experiment (e.g., it doesn't have a way to predict without passing labels or have good error reporting), and I did not want to take responsibility for maintaining the codebase; 2) it did not perform well in our first benchmark and produces large models; 3) the developers were not responsive to questions/requests.
I am happy to include it if you are willing to maintain it, but yes, we now require the methods to return a string version of the final model that can be assessed for symbolic equivalence. Is that possible?
Hi @lacava , Thanks for taking the time to look at my contribution! To answer your questions about GSGP
We decided to leave GSGP out because 1) it was hacked together in our first experiment (e.g., it doesn't have a way to predict without passing labels or have good error reporting), and I did not want to take responsibility for maintaining the codebase; 2) it did not perform well in our first benchmark and produces large models; 3) the developers were not responsive to questions/requests.
here is my answer: 1) this is actually the contribution I proposed, so it shouldn't be an issue anymore; 2) I had the same experience, sometimes it works well, but more often it doesn't...
As for the final model, I don't think it is possible to retrieve it as a readable text formula as it is growing exponentially... I tried but my code crashed rather quickly! Appart from that, I'm not sure it will require a lot of maintenance after that, but yeah, I could do it if needed.
Best, AB
hi @aboisbunon , would you mind re-merging the upstream master? just fixed some versioning conflicts that popped up (#56)
Hi @lacava , no problem, it's done. Let me know if you need anything else!
Hello @lacava , Thank you for the answer. So I checked again, I'm able to compute the complexity of GSGP models (which in the example I tested reached 8e39, so I'm not sure it is really worth it...), but for the formula I really cannot find a way. The formula increases exponentially, resulting in a MemoryError, and using sympy to reduce the formula at each step or regularly does not help either (it does not crash, but it is still nowhere from over after 3hours...). If you know of another tool than sympy to simplify formulas, I would be happy to test it out.
I would suggest to put np.inf for the complexity (or the actual complexity), and np.nan for the formula. What do you think?
Hi @aboisbunon , unless you feel strongly about it, I suggest we go back to not including GSGP. If it produces infinite complexity and no formula, and in our previous tests underperforms linear methods, I don't think it's worth the hassle...
Hi @lacava , no problem, I don't have strong feelings about it :)
Hello! Here is a proposition to improve GSGP's C code and Python interface with the following modifications:
I ran a few tests which went well, I hope you'll find it interesting for the benchmark. BTW, I've seen GSGP has been removed in the new benchmark article, is it because of the unavailability of an explicit model with this method? Best regards, AB