VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.46k stars 1.93k forks source link

Inconsistent and unclear loss calculation for contextual bandits #4495

Open jackgerrits opened 1 year ago

jackgerrits commented 1 year ago

The loss calculation for CB reductions is not consistent and not well documented. The current situation is:

The proposed solution is to unify these implementations to use IPS specifically for clarity. Longer term, we wish to be able to use the various estimator implementations that have been added.

Since this may be a surprising change to anyone measuring model performance based on the DR estimate we will add this as a deprecation with a flag to force IPS to be used in cases where DR was used before and then in VW 10 we will swap the default to IPS.

It is possible to use the estimators in Python with the vw-estimators library. We need to add documentation about the integration of these packages since it is so important.

lalo commented 1 year ago

we also have these other estimator impls (not part of cb_adf): https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/vowpalwabbit/core/include/vw/core/estimators