Closed hardianlawi closed 3 years ago
Hi @hardianlawi, native continuous action support is being worked on currently. Would you be able to elaborate on what guidance you'd like to see for documentation around native continuous action support? As well as documentation about using discretization?
From what I understand, there are different ways of dealing with continuous action. I'm not sure which one is being worked on currently.
What I think would be helpful is:
1) We have a paper here: https://arxiv.org/abs/1902.01520 looking at an approach which I believe in. In essence, you create a continuous distribution to enable counterfactual evaluation of different strategies. @rajan-chari has been working on implementing a practical version of this which should be merged in fairly soon. 3) It is not necessarily one-step policy gradient, because you want to achieve relatively high precision in your choice of action efficiently. This benefits from some logarithmic time prediction approaches.
@JohnLangford Do you have any idea of what "fairly soon" means? I was about to start implementing it on my own but if someone is already doing it may be I could wait a little longer.
Hi @hardianlawi @duburcqa you might want to look at the CATS reduction for continuous action space here which was added recently.
You can find a new reduction, CBZO, in master, a contextual-bandit style algorithm meant for multi-dimensional, continuous action space, here.
Closing this issue, but feel free to open again if you have more questions.
Thank you for keeping up on this issue, I found another way to solve my problem since then but I'm happy to see it is moving forward.
You can find a new reduction, CBZO, in master, a contextual-bandit style algorithm meant for multi-dimensional, continuous action space, here.
Closing this issue, but feel free to open again if you have more questions.
Hi @olgavrou, do you know whether there are plans to extend the implementation of CBZO in VW to the multidimensional action case?
@ajay0 do you have any plans regarding extension of cbzo :top: ?
@olgavrou @sumpfork sorry for the delayed response, I somehow missed the @ mention.
We do have plans to extend to the multidimensional case. We are also planning to include a tree
policy (in addition to the constant
and linear
policy currently available) where the action to take is decided by a decision-tree-like model instead of a linear or constant model. Unfortunately I don't have a timeline that I can give, yet.
Description
Link to Documentation Page