VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

Schema/docs for contextual bandit json format #2414

Closed travisbrady closed 4 years ago

travisbrady commented 4 years ago

Description

Currently I'm not able to find one canonical source of the JSON input format for contextual bandits. For example, I'd like to attempt this example (https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/python/examples/Contextual_Bandit_Example_with_VW_Python_Wrapper.ipynb) using the json format but I'm not able to tell what the field names should be translated to.

Is it "action" => "_action", "cost" => "_cost" and "probability" => "_probability"?

Link to Documentation Page

https://github.com/VowpalWabbit/vowpal_wabbit/wiki/JSON

https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Logged-Contextual-Bandit-Example

jackgerrits commented 4 years ago

The docs are definitely lacking clarity when it comes to JSON input, especially when it comes to CB.

On the JSON page linked the best example there is:

{
    "UserAge": 15,
    "_multi": [
        {
            "_text": "elections maine",
            "Source": "TV"
        },
        {
            "Source": "www",
            "topic": 4,
            "_label": "2:3:.3"
        }
    ]
}

Which is equivalent to:

shared | UserAge:15
| elections maine SourceTV
2:3:.3 | Sourcewww topic:4

Features before the _multi key form the shared example, then each object in the _multi array correspond to each action. The label can be supplied as the form VW text format understands or you could supply it as:

{
    "UserAge": 15,
    "_multi": [
        {
            "_text": "elections maine",
            "Source": "TV"
        },
        {
            "Source": "www",
            "topic": 4,
            "_label_Action":2,
            "_label_Cost":3,
            "_label_Probability":0.3,
        }
    ]
}
travisbrady commented 4 years ago

@jackgerrits thank you so much for the reply.

So what would be the translation of this simple example from Logged-CB-Example wiki page

1:2:0.4 | a c

Also, I have a handful more questions if you don't mind:

I ask all of this because I'm writing vw bindings in OCaml (travisbrady/ocaml-vw) and the json format is easier to work with than doing text munging to match the vw text format.

VW_Learn call example

In my ocaml bindings I have the following code, but I can't tell if vw is accepting my input and learning from it. Is there a way to validate that this worked?

$ let vw = Vw.initialize "--cb 4 --json";;
$ Vw.learn_string vw "{\"_label_Action\": 1, \"_label_Cost\": 2, \"_label_Probability\": 0.4, \"f1\": \"a\", \"f2\": \"c\", \"f3\": \"\", \"_label_Index\": 1}";;
- : float = 4.49393792223418131e-06
jackgerrits commented 4 years ago

I think that would be:

{
    "_multi": [
        {
            "a": 1,
            "c": 1,
            "_label_Action":1,
            "_label_Cost":2,
            "_label_Probability":0.4,
        }
    ]
}

When is multi necessary? Can it be ignored for simple CBs like in the notebook linked above?

_multi is necessary to describe actions in multi_ex situations. For CB it is possible each action would need to be an object in _multi

What is the role of --dsjson? Is it preferable to --json?

So DSJSON is an extension on top of JSON which allows for more logged information, they represent two parsing modes. The DS stands for Decision Service which is a project from John and others, which has now essentially become Azure Personalizer. Because of this DSJSON has always focused on contextual bandits with action dependent features, and sees more support than the other json format. In VW JSON has always been somewhat secondary to VW text format, in the sense that everything should work in the text format, but may be ill specified in the JSON format. I know that's not a great answer, but it is something we are working on improving through a schematized binary format and better example building APIs.

Does the result of a call to VW_Learn tell you if the input data was acceptable?

Generally, learn expects the data to be valid. You would need to use VW_ReadExampleA to get from text to the example.

Is --json available via the C API?

I just did a little digging and it seems like it is not... You may have noticed that the C API is a little bit incomplete and hard to use right now. We are very aware and are actively working on overhauling it to make sure the right functionality is exposed and error handling is fixed. Specific suggestions about requirements of the API are helpful.

That's awesome that you're creating bindings in OCaml! I agree that JSON would be more ergonomic, for the time being there is better support for the text format though. Sorry things may be a little trickier than they should be for the time being. Rest assured we are working hard to make the C bindings more usable, to make bindings like these much easier to create.

travisbrady commented 4 years ago

Thank you, @jackgerrits! This is already tremendously helpful.

One more question: is there a way in the C API to create an example directly without needing the parser? Say by passing a struct?

Also, I'd love to help add support for (DS)JSON input via the C API if you don't already have someone on deck to handle that. Just let me know.

jackgerrits commented 4 years ago

Yeah there is support for constructing an example without parsing. See this test for an example: https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/test/unit_test/vwdll_test.cc

Thanks for the offer, will let you know if there's a task that makes sense!

travisbrady commented 4 years ago

Great. Thank you.