Closed travisbrady closed 4 years ago
The docs are definitely lacking clarity when it comes to JSON input, especially when it comes to CB.
On the JSON page linked the best example there is:
{
"UserAge": 15,
"_multi": [
{
"_text": "elections maine",
"Source": "TV"
},
{
"Source": "www",
"topic": 4,
"_label": "2:3:.3"
}
]
}
Which is equivalent to:
shared | UserAge:15
| elections maine SourceTV
2:3:.3 | Sourcewww topic:4
Features before the _multi
key form the shared example, then each object in the _multi
array correspond to each action. The label can be supplied as the form VW text format understands or you could supply it as:
{
"UserAge": 15,
"_multi": [
{
"_text": "elections maine",
"Source": "TV"
},
{
"Source": "www",
"topic": 4,
"_label_Action":2,
"_label_Cost":3,
"_label_Probability":0.3,
}
]
}
@jackgerrits thank you so much for the reply.
So what would be the translation of this simple example from Logged-CB-Example wiki page
1:2:0.4 | a c
Also, I have a handful more questions if you don't mind:
multi
necessary? Can it be ignored for simple CBs like in the notebook linked above?VW_Learn
tell you if the input data was acceptable?VW_InitializeA("--json")
?I ask all of this because I'm writing vw bindings in OCaml (travisbrady/ocaml-vw) and the json format is easier to work with than doing text munging to match the vw text format.
In my ocaml bindings I have the following code, but I can't tell if vw is accepting my input and learning from it. Is there a way to validate that this worked?
$ let vw = Vw.initialize "--cb 4 --json";;
$ Vw.learn_string vw "{\"_label_Action\": 1, \"_label_Cost\": 2, \"_label_Probability\": 0.4, \"f1\": \"a\", \"f2\": \"c\", \"f3\": \"\", \"_label_Index\": 1}";;
- : float = 4.49393792223418131e-06
I think that would be:
{
"_multi": [
{
"a": 1,
"c": 1,
"_label_Action":1,
"_label_Cost":2,
"_label_Probability":0.4,
}
]
}
When is
multi
necessary? Can it be ignored for simple CBs like in the notebook linked above?
_multi
is necessary to describe actions in multi_ex situations. For CB it is possible each action would need to be an object in _multi
What is the role of --dsjson? Is it preferable to --json?
So DSJSON is an extension on top of JSON which allows for more logged information, they represent two parsing modes. The DS stands for Decision Service which is a project from John and others, which has now essentially become Azure Personalizer. Because of this DSJSON has always focused on contextual bandits with action dependent features, and sees more support than the other json format. In VW JSON has always been somewhat secondary to VW text format, in the sense that everything should work in the text format, but may be ill specified in the JSON format. I know that's not a great answer, but it is something we are working on improving through a schematized binary format and better example building APIs.
Does the result of a call to
VW_Learn
tell you if the input data was acceptable?
Generally, learn expects the data to be valid. You would need to use VW_ReadExampleA
to get from text to the example.
Is --json available via the C API?
I just did a little digging and it seems like it is not... You may have noticed that the C API is a little bit incomplete and hard to use right now. We are very aware and are actively working on overhauling it to make sure the right functionality is exposed and error handling is fixed. Specific suggestions about requirements of the API are helpful.
That's awesome that you're creating bindings in OCaml! I agree that JSON would be more ergonomic, for the time being there is better support for the text format though. Sorry things may be a little trickier than they should be for the time being. Rest assured we are working hard to make the C bindings more usable, to make bindings like these much easier to create.
Thank you, @jackgerrits! This is already tremendously helpful.
One more question: is there a way in the C API to create an example directly without needing the parser? Say by passing a struct?
Also, I'd love to help add support for (DS)JSON input via the C API if you don't already have someone on deck to handle that. Just let me know.
Yeah there is support for constructing an example without parsing. See this test for an example: https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/test/unit_test/vwdll_test.cc
Thanks for the offer, will let you know if there's a task that makes sense!
Great. Thank you.
Description
Currently I'm not able to find one canonical source of the JSON input format for contextual bandits. For example, I'd like to attempt this example (https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/python/examples/Contextual_Bandit_Example_with_VW_Python_Wrapper.ipynb) using the json format but I'm not able to tell what the field names should be translated to.
Is it
"action"
=>"_action"
,"cost"
=>"_cost"
and"probability"
=>"_probability"
?Link to Documentation Page
https://github.com/VowpalWabbit/vowpal_wabbit/wiki/JSON
https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Logged-Contextual-Bandit-Example