Closed saurabh3949 closed 3 years ago
@marco-rossi29 could you comment?
If you have a fixed set of actions, you can assign each action a feature which uniquely identifies it and then interact that to get the equivalent of --cb
. The following example encodes a 3-D vector feature observation and a set of 5 actions, where the 4th action was played by the logging policy with probability 1/7 and the observed cost was -1.
shared |f x1:0.4296374229387013 x2:0.2352946887020415 x3:0.5820648778822854
|a id_0
|a id_1
|a id_2
0:-1:0.14285714285714285 |a id_3
|a id_4
You would use this with a command line like --cb_adf --ignore_linear f -q fa --cubic ffa
. Assuming the action set is fixed you would the same 5 actions with each example, with the only difference being where the annotation from the logging policy is placed.
This gives you some insight as to why --cb_adf
is superior: --cb
assumes you know nothing about your actions except a unique identifier, whereas --cb_adf
allows you to leverage any additional per-action information.
Looks like the was answered. Please go ahead and reopen if you need more info.
Hi, I just wanted to understand how cbify works with cb_explore_adf for supervised datasets that have a fixed no. of actions/labels? How does VW create action dependent features?
If cbify works with adf for discrete actions with no features, how to use cb_explore_adf in general for discrete actions with no features? I want to use regcb for a online bandits problem, but I can't use it because I don't have action dependent features