fidelity / mabwiser

[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
https://fidelity.github.io/mabwiser/
Apache License 2.0
213 stars 42 forks source link

interpreting `predict_expectations` #76

Closed athammad closed 1 year ago

athammad commented 1 year ago

Hi there,

I have been exploring your amazing library, and I am wondering if it would make sense to interpret the output of predict_expectations as the counterfactual rewards for each time step. In other words, predict_expectations represents what would have happened if a different action had been chosen at each time t. Would it make sense?

skadio commented 1 year ago

Not really.. we treat every arm independently, as standard in MAB literature, so there is no causal dependency or relationship between arms. Notice we only observe a reward for the selected arm, and no observed reward for any arm that is not selected.

predict_expectation is similar in analogy to predict_proba in sklearn.

The idea is to enable different arm selection strategies, eg. Select top-k arms.