VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

L2S sub-system: Using search_rollin and search_rollout #3481

Open vedantbhatia opened 3 years ago

vedantbhatia commented 3 years ago

Description

A brief description of the error, missing documentation or what you would like added

Can I please have an example for how to use the search_rollin and search rollout parameters, especially for LOLS? Where and how do I supply these policies?

Link to Documentation Page

Where is the documentation in question? https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Learning-to-Search-Sub-System

lokitoth commented 3 years ago

Hi @vedantbhatia, thank you for your question.

When you specify the different parameters, it will use one of the two existing policies: the "reference" policy and the current "learning" policy based on the model in VW. The "reference" policy is generated depending on how the search_task is configured, but you should think of it as the oracle or trajectory policy. If you are training based on logged data, it will be the implicit policy of the logs, and if you are using a custom task via the hook task mechanism, you will be specifying the oracle action directly at each state.

The parameters search_rollin and search_rollout specify how to choose between the reference/oracle policy and the learned policy, or some mixture of the two:

Roll* Input Equivalents Description
ref oracle Use actions provided by the reference (logged or imitation-target) policy
learned policy Use actions provided by the model
mix_per_roll mix Choose a policy randomly at the beginning of each trajectory considered
mix_per_state Choose a policy randomly every time an action is required
Aut0R3V commented 1 year ago

@olgavrou I would like to work on this. Could you please guide me on how to move forward for this?