-
Is contextual bandits in the scope of this library? Here is a paper for reference:
https://arxiv.org/pdf/1810.09558.pdf
htcml updated
3 years ago
-
Curious why this serve RL and how is this RL related to known RL framework e.g. SB3
-
Hi,
I am currently dealing with "agents/tf_agents/bandits/" . I am wondering where or if the classic Contextual Bandit off-policy evaluation procedures are present in Tensorflow.I mean exactly the…
-
Hi,
There are several usage questions with Contextual Bandits, that I'd be happy to incorporate in the Wiki and stackoverflow.
2. is progressive validation applied when training with IPS? I'm no…
-
I have a contextual bandit problem with (S - state, A - action, R - reward) where S is high-dimensional vector, A is continuous value, R is continuous value, how do I learn optimal mapping function fr…
-
I am currently using the Vowpal Wabbit package in order to implement a Contextual Bandit use case.
My use case is to provide categories(L1/L2/L3/L4/L5) considered action here with personalized rankin…
-
Breaking out the remaining work from https://github.com/VowpalWabbit/vowpal_wabbit/issues/1782
--cb_explore_adf should not need to know about the cb_type since that isn't used except in the cb_adf …
-
I have been playing around with the DCBTrainer and found some potential inconsistencies.
1) **StatlogData** example found [here](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/contex…
-
Currently, there is support for contextual bandit models in Aloha which necessitates the set of actions to be constant. This ticket is to add support for contextual bandit models with action dependent…
-
The loss calculation for CB reductions is not consistent and not well documented. The current situation is:
- `cb_adf` records loss as calculated by an IPS estimator, except for if CB type DR or DM…