Closed lesshaste closed 7 years ago
We are working on a new API for scoring function in TPOT related to the issue #579. For now, could you please try to put 'loss' or 'error' into the scoring function's name to make greater_is_better
is False in make_scorer
function (from this line).
I think the issue here is the following:
Define loss function to be between 0 and 1 where 0 is the best and 1 is the worst for optimisation.
Currently, TPOT assumes that any custom scoring function is to be maximized (i.e., 1 is best and 0 is worst) unless it has loss
or error
in the name. Thus, I would simply keep everything the same but change return 1-pearson_r**2
to return pearson_r**2
.
As @weixuanfu mentioned, we have some scoring
API changes in the works, but this issue can be resolved in the latest release without any TPOT code changes.
Thanks so much! That fixes it indeed (I just changed the scoring function to return pearson_r**2). My next challenge is to get the score above 0.15 .
I am experimenting with tpot for regression with a custom loss function. To do this I have made a toy experiment to see how well it can estimate the permanent of a matrix under particular circumstances. My code samples lots of submatrices of a bigger matrix and trains on that. Unlike a normal regression problem, at test time I want to optimize the correlation coefficient between the predicted and true values. This is where the custom loss function comes in.
Context of the issue
I have specified a simple loss function which is based on scipy.stats.pearsonr. I only modified it to make sure the value is between 0 and 1 and to make it a minimization problem.
Process to reproduce the issue
If you run the following code you will see:
and so on. 1.0 is the worst possible loss score. In other words, it apparently does as badly as it could possibly do. You get something better if you replace the custom loss function by any of the standard built in ones so some optimization is possible for this problem.
In the code the value 4000, should be increased to 40,000 or larger but I have made it small so that it doesn't take too long to run
fit()
function with training dataKeyError
after 5 generationsExpected result
I expect the internal CV score to come down from 1.0.
Current result
The internal CV score is stuck at 1.0.
Possible fix
I feel I must be using a custom loss function incorrectly. How should I have done it?