Closed theKnack closed 4 years ago
Hi @theKnack
--cb
does not do any exploration and just trains the model given the input so the output will be what the model (that has been trained so far) predicted
--cb_explore
includes exploration using epsilon-greedy by default if nothing else is specified. You can take a look at all the available exploration methods here
cb_explore
's output is the PMF given by the exploration strategy (see here for more info).
Epsilon-greedy will choose, with probability e
, an action at random from a uniform distribution
(exploration), and with probability 1-e
epsilon-greedy will use the so-far trained model to predict the best action (exploitation).
So the output will be the pmf over the actions (prob. 1-e OR e for the chosen action) and then the remaining probability will be equally split between the remaining actions. Therefore cb_explore will not provide you with a ranking.
One option for ranking would be to use CCB. Then you get a ranking and can provide feedback on any slot, but it is more computationally expensive. CCB runs CB for each slot, but the effect is a ranking since each slot draws from the overall pool of actions.
@jackgerrits please correct me if I am missing something :)
I think CCB is a good option if computational limits allow. I'd just like to add that if you do cb_explore
or cb_explore_adf
then the resulting PMF should be sorted by score so it is a ranking of sorts. However, it's worth verifying that the ordering is in fact sorted by scores (--audit
will help here) as I don't know if there is a test covering this.
Thanks a lot @olgavrou and @jackgerrits for the detailed reply!
I am using Vowpal Wabbit's contextual bandit to rank various action given a context.
Now, the expected ranking of action should be (from least loss to most loss):
Using
--cb
just returns the most optimal action:And using
--cb_explore
returns a pdf of the actions to be explored but it doesn't seem to help in ranking.Is there any other way of using vw's contextual bandit for ranking?