david-cortes / contextualbandits

Python implementations of contextual bandits algorithms
http://contextual-bandits.readthedocs.io
BSD 2-Clause "Simplified" License
750 stars 148 forks source link

Prediction vary every time with the loaded model #12

Closed wenruij closed 5 years ago

wenruij commented 5 years ago

@david-cortes I have saved a model in the following way:

base_algorithm = SGDClassifier(random_state=123, loss='log')
beta_prior = ((3, 7), 2)
model = BootstrappedUCB(deepcopy(base_algorithm), nchoices = nchoices, batch_train=True, beta_prior=beta_prior)
for i in range(iters):   // for loop with several iterations
    // shape for X: [batch, 2626]
    // shape for a: [batch, 1]
    // shape for r: [batch, 1]
    model.partial_fit(X, a, r)
target_model = "20190521.dill"
dill.dump(model, open(target_model, "wb"))

BUT i got different prediction results every time for the same input, here is a simulation:

>>> model = dill.load(open("20190521.dill", "rb"))
>>> X = np.random.normal(size=(1, 2626))
>>> res01 = model.decision_function(X)
>>> res01[0][:5]
array([0.447249  , 0.27269542, 0.48439773, 0.26759085, 0.1235832 ])
>>> 
>>> res02 = model.decision_function(X)
>>> res02[0][:5]
array([0.1319437 , 0.21268724, 0.40948264, 0.13509549, 0.15605585])
>>> 
>>> 
>>> pred01 = model.predict(X)
>>> pred01
array([651])
>>> 
>>> model.predict(X)
array([210])
>>> model.predict(X)
array([1741])

20190521.dill is the model trained with BootstrappedUCB in the above way.

david-cortes commented 5 years ago

This is actually expected behavior.

When you pass beta prior = ((a,b), c), what it does is it will wait until an arm has seen c cases with and without reward until it starts to use the fitted models for predictions, and in the meantime it generates random numbers ~ Beta(a + n_reward, b + n_no_reward).

For more details, you can check the paper: https://arxiv.org/abs/1811.04383

If you don’t like this behavior, you can pass beta_prior = None, but it will have a huge performance impact (see plots at the end in the same paper in pages 21-24 to see why this is there by default). Methods like UCB don’t work properly without it. Alternatively, you can pass smoothing instead, which might offset some of the performance impact.

I agree however that these methods need to have a non-random predictor when using exploit = True in predict, so that I’ll fix, but it won’t change anything for decision_function.

wenruij commented 5 years ago

Thanks , have caught the detail. A predictor with only exploit mode can play a role in some case.