VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

online learning and prediction with conditional contextual bandit #4154

Closed umuur closed 2 years ago

umuur commented 2 years ago

Describe the bug

Is online learning and getting predictions realtime possible with ccb at this moment? I know it's possible with classic contextual bandit, but couldn't find any resources for ccb for python.

How to reproduce

Version

9.3.0

OS

macOS

Language

Python

Additional context

No response

bassmang commented 2 years ago

Hi @umuur

The issue here is that learn and predict must be called on the entire ccb example, which requires at least as many actions as slots (making ccb slot 1:0:0.5 0,1 | invalid). I would recommend something more like this for your use:

import vowpalwabbit
vw = vowpalwabbit.Workspace("--ccb_explore_adf", quiet=True)

ccb_ex = """
    ccb shared | age feature1
    ccb action |
    ccb action |
    ccb slot 0:1:0.6 0,1 |
    ccb slot 1:0:0.5 0,1 |
    """
vw.learn(ccb_ex)
vw.predict(ccb_ex)

Calling predict here will return: [[(0, 0.5), (1, 0.5)], [(1, 1.0)]] where [(0, 0.5), (1, 0.5)] are the decision scores for the first slot and [(1, 1.0)] are the scores for the second. Let me know if you have any questions.

umuur commented 2 years ago

Hi @bassmang

Thank you! I think I misunderstand the multiline concept before. Now it makes more sense!

olgavrou commented 2 years ago

closing this issue, please feel free to reopen if you have more questions