david-cortes / contextualbandits

Python implementations of contextual bandits algorithms
http://contextual-bandits.readthedocs.io
BSD 2-Clause "Simplified" License
750 stars 148 forks source link

Possibly unexpected behaviour of decision function #60

Closed Yalikesifulei closed 2 years ago

Yalikesifulei commented 2 years ago

Hello @david-cortes, thanks for this Contextual Bandits package.

While using some of the online methods (BootstrappedTS, AdaptiveGreedy, maybe some others) from this package, I've faced some unexpected (at least to me) behaviour of decision_function and other related functions like predict.

Let's use some simple dummy data (it doesn't matter much) as an example:

import numpy as np 
from contextualbandits.online import * 
from sklearn.datasets import load_iris 
from sklearn.linear_model import LogisticRegression

RANDOM_STATE = 42

X, y = load_iris(return_X_y=True)
a = np.random.randint(3, size=len(y))
r = 1 * (y == a)

cb_model_1 = BootstrappedTS(
    base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_2 = BootstrappedTS(
    base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_1.fit(X, a, r)
cb_model_2.fit(X, a, r)

print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))
print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))

The output I get is

[[0.96298824 0.11472752 0.00019669]]
[[0.96298824 0.11472752 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]

Setting random_state makes predictions of cb_model_1 and cb_model_2 equal as it should, but it's unclear for me why calling decision_function second time changes the output. Another way to see this behaviour is to compare two predictions of the same model:

pred_1 = cb_model_1.predict(X)
pred_2 = cb_model_1.predict(X)
print((pred_1 == pred_2).mean())

outputs 0.92.

But the most confusing case is when it's needed to get both scores for each arm from decision function and action prediction:

pred = cb_model_1.predict(X)
dec_func = cb_model_1.decision_function(X)
print((np.argmax(dec_func, axis=1) == pred).mean())

outputs 0.96.

So, is this type of behaviour expected? I think it can be related to how some methods work, e.g.

Bootstrapped Thompson Sampling

Performs Thompson Sampling by fitting several models per class on bootstrapped samples, then makes predictions by taking one of them at random for each class.

But in my opinion, setting random state should block this randomization, especially in decision_function, or there should be a way to block it with another parameter.

david-cortes commented 2 years ago

Key there is the parameter beta_prior. If you don't want any randomization in a call to predict, you can pass parameter exploit=True. However, the whole point of BootstrappedTS is to use that randomization in making choices. If you want reproducible results across multiple calls on the same data, you can also try resetting the random number generators before each call (this is not documented and not part of the public attributes so you'll have to look into the source code).