VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

daemon mode (version 8.5.0) probabilities returned for cost reporting #1536

Closed matanox closed 3 years ago

matanox commented 6 years ago

Hi,

I have an offline trained Contextual Bandits model, which I fiddle in online (daemon) mode.

I am probably misinterpreting the meaning of the probabilities returned from the daemon when a cost is communicated to it: I had previously assumed they are the updated probabilities that will have been returned if the same features were sent in without the cost included, i.e. the prediction the model would make for the supplied context.

But then I'm unable to explain the following scenario to myself:

echo " |  feature-set-x " | netcat localhost 26542

0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

echo "1:0.02 |  feature-set-x " | netcat localhost 26542

0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

echo " |  feature-set-x " | netcat localhost 26542

0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

The three messages here are all for the same context. I use feature-set-x above as a placeholder for the full and constant context sent in this flow, just to obfuscate the private/proprietary feature name/values. Could anything explain this zig-zag?

I've not reproduced with 8.6 yet. Much obliged as I might be missing something big.

matanox commented 6 years ago

When I explicitly include a probability along the cost, over the same unmodified base model file, I get a flow like below where the probabilities returned for the cost-carrying message do not reflect the updated policy, it may seem.

echo " |  feature-set-x " | netcat localhost 26542

0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

echo "1:0.02:1 |  feature-set-x " | netcat localhost 26542

0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

echo " |  feature-set-x " | netcat localhost 26542

0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125

This is all with epsilon-greedy. Disclaimer: IIRC the exploration is fully reflected in the probability distribution being returned, and not reflected in getting a different probability epsilon of the time.... If that's all true, what are the exact semantics of the probabilities returned for a cost inclusive message?

JohnLangford commented 5 years ago

I'm slightly confused by this report. What exactly are the flags being used?

jackgerrits commented 5 years ago

Hi @matanster,

Would you be able to provide the full command fine for the instance running in daemon mode? Also would you mind reproing in 8.7?

jackgerrits commented 3 years ago

Closing issue as we do not have enough info to repro. Please reopen if this is still an issue you would like help with!