ShuhuaGao / geppy

A framework for gene expression programming (an evolutionary algorithm) in Python
https://geppy.readthedocs.io/en/latest/
GNU Lesser General Public License v3.0
207 stars 76 forks source link

NaN, infinity or a value too large #23

Closed xinyu-2020 closed 4 years ago

xinyu-2020 commented 4 years ago

Hi Shuhua, When running the example of GEP_RNC_for_ML_with_UCI_Power_Plant_dataset, I added a symbolic function x^y: pset.add_function(operator.pow, 2) it throwed a error as the following: ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). Please give me some advice . Thank you

xinyu-2020 commented 4 years ago

The error is:

:1: RuntimeWarning: invalid value encountered in double_scalars Traceback (most recent call last): File "C:/Users/xinyu/Desktop/CalData/GEP-RNC-UCI.py", line 94, in pop, log = gep.gep_simple(pop, toolbox, n_generations=n_gen, n_elites=1,stats=stats, hall_of_fame=hof, verbose=True) File "C:\ProgramData\Anaconda3\lib\site-packages\geppy\algorithms\basic.py", line 100, in gep_simple for ind, fit in zip(invalid_individuals, fitnesses): File "C:/Users/xinyu/Desktop/CalData/GEP-RNC-UCI.py", line 63, in evaluate return r2_score(Y, Yp), File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\regression.py", line 538, in r2_score y_true, y_pred, multioutput) File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\regression.py", line 79, in _check_reg_targets y_pred = check_array(y_pred, ensure_2d=False) File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 542, in check_array allow_nan=force_all_finite == 'allow-nan') File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 56, in _assert_all_finite raise ValueError(msg_err.format(type_err, X.dtype)) ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). Process finished with exit code 1
ShuhuaGao commented 4 years ago

Hi,

It is a little risky to use the power function x^y directly, because the value of x or y may be very large during evaluation, which causes the error you reported. If you really want a power function, please use a safe one: fabricate your own function that handles too large inputs internally.

xinyu-2020 commented 4 years ago

OK,thank you. In my project, fitness evaluation function is MSE. But I still need to know R^2(the coefficient of determination) .How can I know the R^2 of best several individuals.

ShuhuaGao commented 4 years ago

Hi, xinyu,

I don't think your problem is related to geppy. You can simply evaluate the several best individuals and compute the R^2 according to its formula. An individual, i.e., an expression tree, is just like a function, which accepts your inputs during evaluation and gives the output.

xinyu-2020 commented 4 years ago

In my program, the MSE of best indivudual is still too big, Which parameters should I modify? n_pop=1000 n_gen=1000 h = 20 n_genes = 2 r = 10

ShuhuaGao commented 4 years ago

Hi, xinyu,

Generally in evolutionary computation, a larger size of the population and more generations would lead to better results. You can play with these few parameters or refer to a GEP book for more guidance.