Dlux804 / McQuade-Chem-ML

Development of easy to use and reproducible ML scripts for chemistry.
5 stars 1 forks source link

BayesSearchCV evaluating the same parameter continuously #45

Closed qle2 closed 4 years ago

qle2 commented 4 years ago

Describe the bug During hyperparameter tuning, BayesSearchCV continuously cycle the same parameter multiple times in a row before changing it

To Reproduce Steps to reproduce the behavior:

  1. Go to main.py
  2. Run any machine learning set up with hyperparameter tuning

Screenshots image

During this run, the problem had continued on for 4 more times before it moved on to another parameter.

Additional context I've attempted this fix but it's been giving more errors: https://github.com/scikit-optimize/scikit-optimize/issues/441

Looks like this has been an issue for others as well without a fix: https://github.com/scikit-optimize/scikit-optimize/issues/302

This error is shown in their documentation: https://scikit-optimize.github.io/stable/auto_examples/store-and-load-results.html

qle2 commented 4 years ago

Trying to determine if this problem is more prevalent in certain algorithms.

I've been using Random Forest (RF) and Support Vector Machine (SVM) a lot recently and I notice this error pops up significantly more while running RF than SVM.

Dlux804 commented 4 years ago

How often does this occur? Every run?

Dlux804 commented 4 years ago

image

I just did a test run with RF and a small number of iterations -- no error.

It may be time to explore other optimization options, like hyperopt

qle2 commented 4 years ago

I'm doing experiments with 10+ iterations and it occurs every run. I think with SVR, since the numerical parameters are real numbers, the error occurs less often than with RF whose numerical parameters are all integers.

Dlux804 commented 4 years ago

Have you tried changing the random_state in the BayesSearchCV intialization?

qle2 commented 4 years ago

I don't see this issue occurring while running GDB and SVR, with callbacks or not. I will close this issue for now since it is only affecting rf

nionita commented 4 years ago

This still happens a lot. I use the library to optimize evaluation parameters for my chess engine, and one evaluation takes more than one hour, so evaluating the same point again is very costly:

Time reporting on: 17.09.2020 18:04:53
Since: 29211 seconds, per step: 4173 seconds
Remaining: 16692 seconds, ETA: 17.09.2020 22:43
Step: 19
C:\Users\nicu\Anaconda3\lib\site-packages\skopt\optimizer\optimizer.py:409: UserWarning: The objective has been ev
aluated at this point before.
  warnings.warn("The objective has been evaluated "
Params: [31, 0]
Play: starting 25 times with 20 games each, timeout = 642
Partial play result: 15 16 9    (414 seconds, remaining games: 24)

My objective function is very noisy too, so now it depends, what the optimization algorithm does with the result: can it use it to estimate the noise better? If this is the case, than the new run is not lost and it bring new information.