VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

vowpal wabbit java: get raw predictions #3777

Open onlynishant opened 2 years ago

onlynishant commented 2 years ago

I am using Java API of vowpal wabbit to get predictions. I need raw prediction (same as -r output.txt) but I couldn't find any such method in VWMulticlassLearner class. I am using below arg to train my model in python via cmd -

vw -f model_filepath -c --cache_file cache_filepath -k --csoaa 40 -b 24 -q cd -q .... -q n: --ignore a --ignore x

and we are using below code in Java to get predictions -

VWLearners.create("-i ./data/train.model  -t --quiet"); // VWMulticlassLearner
VWLearners.create("-i ./data/train.model  -t --quiet --csoaa_ldf=mc --loss_function=logistic --probabilities"); //VWProbLearner

None of the classes has any method which returns raw prediction.

I want the same prediction as below -

$ echo ' .. sample string .. ' | vw -i data/train.model -t -r test -p /dev/stdout
creating quadratic features for pairs: cd ce cu cw de du dw eu ew uw n:
ignoring namespaces beginning with: a x
only testing
predictions = /dev/stdout
raw predictions = test
Num weight bits = 24
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile =
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
39
0.000000 0.000000            1            1.0    known       39      171

finished run
number of examples per pass = 1
passes used = 1
weighted example sum = 1.000000
weighted label sum = 0.000000
average loss = 0.000000
total feature number = 171

$ cat test
0:1.05645 1:0.83437 2:-0.210798 3:-2.81048 4:-4.47558 5:-4.45883 6:-3.65177 7:-3.71191 8:-2.96008 9:-2.82846 10:-2.31816 11:0.925984 12:3.28547 13:5.20375 14:6.34244 15:6.13525 16:1.65726 17:1.22801 18:1.35034 19:3.27091 20:2.94066 21:-0.0276409 22:0.391437 23:1.267 24:-0.689573 25:0.0171876 26:3.12935 27:3.95045 28:3.86978 29:1.18468 30:0.0921049 31:0.436564 32:0.98946 33:1.00963 34:-0.265355 35:-3.02128 36:-2.52846 37:-2.8066 38:-3.50639 39:-4.6184

How can I get values that are in file test in Java as a method response? I don't want to read the file to get a response in Java which will be slow.

jackgerrits commented 2 years ago

Raw predictions are only available via the command line currently. I can see this being useful, and there are similar situations such as getting the scores as well as probabilities when using contextual bandits. Patches are welcome, but for something of this nature proposing a design would be necessary before building it.

onlynishant commented 2 years ago

@jackgerrits I saw there is already an old PR for it: https://github.com/VowpalWabbit/vowpal_wabbit/pull/1244

do you think it's still relevant and can be used?

jackgerrits commented 2 years ago

The fact partial_prediction is stashed into the cost sensitive label is already a pretty big hack, I would prefer we don't go forward with that design. In saying that though, raw predictions are not super well represented in the framework of predictions as they are effectively all one-off situations so I am not sure of a design at the moment