Closed athammad closed 1 year ago
Not really.. we treat every arm independently, as standard in MAB literature, so there is no causal dependency or relationship between arms. Notice we only observe a reward for the selected arm, and no observed reward for any arm that is not selected.
predict_expectation is similar in analogy to predict_proba in sklearn.
The idea is to enable different arm selection strategies, eg. Select top-k arms.
Hi there,
I have been exploring your amazing library, and I am wondering if it would make sense to interpret the output of
predict_expectations
as the counterfactual rewards for each time step. In other words,predict_expectations
represents what would have happened if a different action had been chosen at each time t. Would it make sense?