Closed wenruij closed 5 years ago
This is actually expected behavior.
When you pass beta prior = ((a,b), c)
, what it does is it will wait until an arm has seen c
cases with and without reward until it starts to use the fitted models for predictions, and in the meantime it generates random numbers ~ Beta(a + n_reward, b + n_no_reward)
.
For more details, you can check the paper: https://arxiv.org/abs/1811.04383
If you don’t like this behavior, you can pass beta_prior = None
, but it will have a huge performance impact (see plots at the end in the same paper in pages 21-24 to see why this is there by default). Methods like UCB don’t work properly without it. Alternatively, you can pass smoothing
instead, which might offset some of the performance impact.
I agree however that these methods need to have a non-random predictor when using exploit = True
in predict, so that I’ll fix, but it won’t change anything for decision_function
.
Thanks , have caught the detail. A predictor with only exploit mode can play a role in some case.
@david-cortes I have saved a model in the following way:
BUT i got different prediction results every time for the same input, here is a simulation:
20190521.dill is the model trained with BootstrappedUCB in the above way.