VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.48k stars 1.93k forks source link

cbify with cb_explore_adf for supervised datasets #2024

Closed saurabh3949 closed 3 years ago

saurabh3949 commented 5 years ago

Hi, I just wanted to understand how cbify works with cb_explore_adf for supervised datasets that have a fixed no. of actions/labels? How does VW create action dependent features?

If cbify works with adf for discrete actions with no features, how to use cb_explore_adf in general for discrete actions with no features? I want to use regcb for a online bandits problem, but I can't use it because I don't have action dependent features

jackgerrits commented 5 years ago

@marco-rossi29 could you comment?

pmineiro commented 4 years ago

If you have a fixed set of actions, you can assign each action a feature which uniquely identifies it and then interact that to get the equivalent of --cb. The following example encodes a 3-D vector feature observation and a set of 5 actions, where the 4th action was played by the logging policy with probability 1/7 and the observed cost was -1.

 shared |f x1:0.4296374229387013 x2:0.2352946887020415 x3:0.5820648778822854 
 |a id_0
 |a id_1
 |a id_2
 0:-1:0.14285714285714285 |a id_3
 |a id_4

You would use this with a command line like --cb_adf --ignore_linear f -q fa --cubic ffa. Assuming the action set is fixed you would the same 5 actions with each example, with the only difference being where the annotation from the logging policy is placed.

This gives you some insight as to why --cb_adf is superior: --cb assumes you know nothing about your actions except a unique identifier, whereas --cb_adf allows you to leverage any additional per-action information.

jackgerrits commented 3 years ago

Looks like the was answered. Please go ahead and reopen if you need more info.