VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.46k stars 1.93k forks source link

progressive validation with Contextual Bandits #1542

Open matanox opened 6 years ago

matanox commented 6 years ago

Hi,

There are several usage questions with Contextual Bandits, that I'd be happy to incorporate in the Wiki and stackoverflow.

  1. is progressive validation applied when training with IPS? I'm not entirely sure from the ICML tutorial/slides whether progressive validation and IPS are mutually exclusive, or when is progressive validation applied when using VW commands. image

  2. --cb v.s. --cb_explore: we get a smaller training loss when training with --cb_explore. If you were to train a (new) model from logged data, would you really choose --cb for that, and why so? is there something methodologically flawed in training a brand-new model (from logged data) with --cb_explore?

  3. How does --eval interact with any of the above?

Your comments may be incorporated on the SO question for reuse, if you like also on the Wiki.

Thanks!

matanox commented 6 years ago

These above remain after having watched the 2017 ICML Tutorial (which centers more on the ADF scenario; we don't seem to need ADF yet).

JohnLangford commented 5 years ago

2) Progressive validation is the default in VW, so yes. 3) --cb_explore is generally preferred because it incorporates exploration allowing you to generate further data. It's odd that this works better than --cb, sounds like you are stochastically lucky? 4) --eval is a method for evaluating any policy via a modified "label" which includes the policy's chosen action.

Go ahead and update SO/Wiki as seems appropriate.