Possibly unexpected behaviour of decision function

Hello @david-cortes, thanks for this Contextual Bandits package.

While using some of the online methods (BootstrappedTS, AdaptiveGreedy, maybe some others) from this package, I've faced some unexpected (at least to me) behaviour of decision_function and other related functions like predict.

Let's use some simple dummy data (it doesn't matter much) as an example:

import numpy as np 
from contextualbandits.online import * 
from sklearn.datasets import load_iris 
from sklearn.linear_model import LogisticRegression

RANDOM_STATE = 42

X, y = load_iris(return_X_y=True)
a = np.random.randint(3, size=len(y))
r = 1 * (y == a)

cb_model_1 = BootstrappedTS(
    base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_2 = BootstrappedTS(
    base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_1.fit(X, a, r)
cb_model_2.fit(X, a, r)

print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))
print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))

The output I get is

[[0.96298824 0.11472752 0.00019669]]
[[0.96298824 0.11472752 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]

Setting random_state makes predictions of cb_model_1 and cb_model_2 equal as it should, but it's unclear for me why calling decision_function second time changes the output. Another way to see this behaviour is to compare two predictions of the same model:

pred_1 = cb_model_1.predict(X)
pred_2 = cb_model_1.predict(X)
print((pred_1 == pred_2).mean())

outputs 0.92.

But the most confusing case is when it's needed to get both scores for each arm from decision function and action prediction:

pred = cb_model_1.predict(X)
dec_func = cb_model_1.decision_function(X)
print((np.argmax(dec_func, axis=1) == pred).mean())

outputs 0.96.

So, is this type of behaviour expected? I think it can be related to how some methods work, e.g.

Bootstrapped Thompson Sampling

Performs Thompson Sampling by fitting several models per class on bootstrapped samples, then makes predictions by taking one of them at random for each class.

But in my opinion, setting random state should block this randomization, especially in decision_function, or there should be a way to block it with another parameter.

david-cortes / contextualbandits

Possibly unexpected behaviour of decision function #60