Closed matanox closed 3 years ago
When I explicitly include a probability along the cost, over the same unmodified base model file, I get a flow like below where the probabilities returned for the cost-carrying message do not reflect the updated policy, it may seem.
echo " | feature-set-x " | netcat localhost 26542
0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125
echo "1:0.02:1 | feature-set-x " | netcat localhost 26542
0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125
echo " | feature-set-x " | netcat localhost 26542
0.953125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125 0.003125
This is all with epsilon-greedy. Disclaimer: IIRC the exploration is fully reflected in the probability distribution being returned, and not reflected in getting a different probability epsilon of the time.... If that's all true, what are the exact semantics of the probabilities returned for a cost inclusive message?
I'm slightly confused by this report. What exactly are the flags being used?
Hi @matanster,
Would you be able to provide the full command fine for the instance running in daemon mode? Also would you mind reproing in 8.7?
Closing issue as we do not have enough info to repro. Please reopen if this is still an issue you would like help with!
Hi,
I have an offline trained Contextual Bandits model, which I fiddle in online (daemon) mode.
I am probably misinterpreting the meaning of the probabilities returned from the daemon when a cost is communicated to it: I had previously assumed they are the updated probabilities that will have been returned if the same features were sent in without the cost included, i.e. the prediction the model would make for the supplied context.
But then I'm unable to explain the following scenario to myself:
The three messages here are all for the same context. I use
feature-set-x
above as a placeholder for the full and constant context sent in this flow, just to obfuscate the private/proprietary feature name/values. Could anything explain this zig-zag?I've not reproduced with 8.6 yet. Much obliged as I might be missing something big.