Closed sergeyf closed 3 years ago
I also tried it with the sklearn interface and the bug is more obvious:
from vowpalwabbit.sklearn_vw import VWMultiClassifier
vw = VWMultiClassifier(
convert_to_vw=False, passes=1, csoaa=y_train.shape[1], b=26, ngram="d2", probabilities=True, l2=0.001
)
vw.fit(Xy_train)
vw.predict_proba(Xy_val[0:1])
You get:
Out:
array([[22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22.,
22., 22., 22., 22., 22., 22., 22., 22., 22., 22., 22.]])
Hi Sergey,
Thank you for reporting this bug and all of the documentation you provided. csoaa
currently does not support the probabilities
flag. We'll work on adding this functionality this week and get back to you.
Thanks!
Wonderful, thank you!
Hi @sergeyf, we've looked into your issue further and come to the conclusion that the --probabilities flag doesn't quite make sense in the context of --csoaa. To address you're issue I've put in a fix to add the --probabilities flag to --multilabel_oaa instead. This should be sufficient for you example above since each class has a weight of either 0 or 1. Here is an example of how you could use --multilabel_oaa with the --probabilities flag:
from vowpalwabbit import pyvw
Xy_train = ['0,1,2 |text a transient based real time scheduling algorithm in fms']
Xy_train.append('1,2 |text a transient based')
vw = pyvw.vw(quiet=True, multilabel_oaa=3, probabilities=True, loss_function='logistic')
passes = 1
for n in range(passes):
for idx, example in enumerate(Xy_train):
vw.learn(example)
Xy_val = "|text a"
vw.predict(Xy_val)
Output:
[0.2910301685333252, 0.3539791405200958, 0.35499072074890137]
That is perfect, thank you for the fix!
Hi Sergey, this fix has just been merged. Here's a description of the new functionality:
Please let me know if you have any questions.
Hello,
Thanks for the great work on VW over the many years!
I'm using the Python package with the
csoaa
option to deal with a multilabel problem. It seems to train fine, but I can't get it to return probabilities. This is how I trained without any errors:Where
Xy_train
has things likeAt test time, I have data like:
And doing
vw.predict(Xy_val[0])
probabilities aren't returned, but just a single integer. How can one get probabilities out of this? I've triedvw.predict(Xy_val[0], i)
for variousi
and no luck.Thank you.