VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.45k stars 1.93k forks source link

use IPS estimator for average loss of --cb_type dr #2222

Open marco-rossi29 opened 4 years ago

marco-rossi29 commented 4 years ago

See #1696 for more details

--cb_type dr is a mode of learning, but we should enable the ability to use the IPS estimator to compute the average loss

maxpagels commented 3 years ago

Working on OPE docs for vowpalwabbit.org (https://github.com/VowpalWabbit/vowpalwabbit.github.io/pull/193), and I don't know the state of this issue, but allowing learning using some cb_type and reporting loss using something else like IPS would be very useful. I suspect many people just gridsearch everything and wonder why it isn't working.

rangi513 commented 3 years ago

Should this feature request also include the ability to use the IPS/DR estimator for evaluating the average PV loss of a policy (offline) using --cb_type mtr in the learning algorithm?

jackgerrits commented 1 year ago

To address this we can: