L2S sub-system: Using search_rollin and search_rollout

vedantbhatia commented 3 years ago

Description

A brief description of the error, missing documentation or what you would like added

Can I please have an example for how to use the search_rollin and search rollout parameters, especially for LOLS? Where and how do I supply these policies?

Link to Documentation Page

Where is the documentation in question? https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Learning-to-Search-Sub-System

lokitoth commented 3 years ago

Hi @vedantbhatia, thank you for your question.

When you specify the different parameters, it will use one of the two existing policies: the "reference" policy and the current "learning" policy based on the model in VW. The "reference" policy is generated depending on how the search_task is configured, but you should think of it as the oracle or trajectory policy. If you are training based on logged data, it will be the implicit policy of the logs, and if you are using a custom task via the hook task mechanism, you will be specifying the oracle action directly at each state.

The parameters search_rollin and search_rollout specify how to choose between the reference/oracle policy and the learned policy, or some mixture of the two:

Roll* Input	Equivalents	Description
`ref`	`oracle`	Use actions provided by the reference (logged or imitation-target) policy
`learned`	`policy`	Use actions provided by the model
`mix_per_roll`	`mix`	Choose a policy randomly at the beginning of each trajectory considered
`mix_per_state`		Choose a policy randomly every time an action is required

Aut0R3V commented 1 year ago

@olgavrou I would like to work on this. Could you please guide me on how to move forward for this?

VowpalWabbit / vowpal_wabbit

L2S sub-system: Using search_rollin and search_rollout #3481

Description

Link to Documentation Page