-
It feels that the examples of MDPs used in lectures and assignments encourage the idea that while an action may lead to many states, the triplet old state - action - new state always produces the same…
-
Can someone help with the inputs on this example as it would related to online advertising?
action - the unique id of the event that occurred, like perhaps the id of the ad that was shown, possibly t…
-
I was trying to understand deeper the way VW contextual bandits work and I found something intriguing on a toy example: with direct method, the predictions are not the same if I specify the allowed ac…
-
Hi,
First of all, thank you for this nice piece of software (vw)!
I'm looking for a way to get predictions for all the arms for given example. I have found that there is `-r` / `--raw_predictions` opt…
-
Dear Arthur,
I am following your tutorials for reinforcement learning. It is very helpful. However, when I try to run "Contextual-Policy.ipython", I encounter some problems. Could you tell me how t…
-
Consider adding active learning strategies, contextual bandits, etc. to CPA as options.
References:
- http://www.ncbi.nlm.nih.gov/pubmed/24643256
- http://hunch.net/?p=1800
- http://hunch.net/~explor…
-
Because we want to publish on pypi, we need to release (tag) a version ASAP.
@stegben @taweihuang Please provide the changelog of the current master, note that this changelog is for the users of this…
-
Start with background on Thompson sampling.
In particular, describe:
- An Empirical Evaluation of Thompson Sampling,
Olivier Chapelle, Lihong Li
- Thompson Sampling for Contextual Bandits with Line…