vw --cbify does not properly handle multi-label dataset

haltux commented 4 years ago

Describe the bug

Multi-label datasets for CB experimentation seems to be highly relevant considering that practically, in real life CB problem, several actions often lead to sucess, which unless I am missing something is not the case in experiments based on multi-class problems.

It seems that multi-label dataset can be generated to be used by Vowpwabbit for CB experiments. I have not found it documented anywhere, however such datasets are specifically generated for vw in this project: https://github.com/mwydmuch/datasets4vw

Additionnaly, the paper "A contextual bandit bake off" cite experiments on multi-label data with vowpal wabbit, even though unfortunately it seems that they released the code only for single-label data.

The format is (from the previously cited project):

1,2 | x:0 y:1
1,3 | x:1 y:1

This is properly parsed by vw and raises no error. Unfortunately I could not find a way to properly use it.

To Reproduce

Steps to reproduce the behavior:

Create this dataset and store it in /tmp/data.vw:

1,2,3 | x:0 y:0

Run:

vw --cbify 3 --cb_explore_adf /tmp/data.vw

Expected behavior

As all labels are associated to the unique data point, the loss should be 0.

Observed Behavior

average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0        1        2        1

The current label is 1. Just like if the other labels where compeltely ignored. So as the cb algorithm predict 2, it gets a loss of 1, which is wrong.

If this is really the expected behavior, please tell me how it is actually possible to run vw on multi-label datasets.

Environment

8.8.0 on linux / command line.

Additional context

jackgerrits commented 4 years ago

Hi @haltux,

So there are two things here. --cbify takes multiclass labels, and not multilabels. As you correctly noticed, the parser is actually extracting the first number and using that as the label here. This is clearly a problem, the parser should have given you an error instead. I've created issue #2264 to track this.

The second thing is how you can achieve the behavior you're looking for. I briefly discussed this with @pmineiro, I'll let him chime in here as I think he can do a better job explaining.

pmineiro commented 4 years ago

Your question unfortunately requires a verbose response.

Supervised Multi-class and Multi-label

Consider a multi-class problem with 4 classes. For each input there is one good class and 3 bad classes, so the possible cost vectors all have a single zero in them and the rest ones like (0, 1, 1, 1), (1, 0, 1, 1), etc. Because there is only "one good class" per example it is common to consider models that only output a single class per example and associated metrics like accuracy. However sometimes models that output multiple classes per example are considered, with associated metrics like precision-at-2.

In a multi-label problem with 4 classes, the possible cost vectors are corners of the hypercube like (0, 0, 1, 0), (1, 1, 1, 0), etc. Sometimes models that output a single class are considered in this context, in which case metrics like accuracy (precision-at-1) still make sense. It is very common in this scenario, however, to consider models that output multiple classes per example.

Contextual Bandit Learning in VW

First, VW cb models always output a single action. Thus representing each class as an action only makes sense when considering the analog to supervised learning models that output one class per example. If the model is required to output multiple classes per example then a different action representation is required which is outside the scope of this answer. The multi-label models in the contextual bandit experiments refered to in the question only output a single class per example.

Second, the basic idea when converting a supervised learning dataset into a contextual bandit dataset is to only reveal the component of the cost vector associated with the action taken by the logging policy. The fact that there might be more than a single zero-valued component of the cost vector is not relevant, and the procedure is the same for multi-class and multi-label cost vectors.

Third, to improve the statistical efficiency of policy evaluation the cb_label format allows the user to specify costs associated with actions not taken by the logging policy if they are known for some reason (see the paragraph here that starts with "Additionally one can specify the costs of all actions if they are known..."). For multi-label problems, leveraging this is a good idea.

How to do it

Assuming you want to output a single class per example, for a 4-class multilabel problem you could take an example with good labels 1 and 3, for which the historical policy choose label 2 with probability 0.25, as

 1:1 2:0:0.25 3:1 4:0 | features...

jackgerrits commented 2 years ago

I think we've answered your question so I am going to go ahead and close this. If you would like to discuss more or feel it isn't answered then please feel free to open a new issue or reopen this one.

VowpalWabbit / vowpal_wabbit