Closed musram closed 2 years ago
HI, there is no need to use multiple bandits, you can use --cb_explore_adf (adf is action dependent features). This way you can specify which actions are available for each decision. The downside is that you need to actually pass the action as features string in each predict and learn call so it may be slow with such high number of action, depending on your latency requirement.
Thanks marco-rossi29. I will check.
Short description
I have a requirement where I have around 10,000 arms, a contextual bandit. But during the exploration, I don't want to explore all the arms. For example, I will explore 1 to 200 arms for some time then 201 to 400 arms. Is there any way, I could include the set of arms in the command line?
How this suggestion will help you/others
This application pops up regularly, where you don't want to jump from arm 1 to arm x. Do exploration in a subset of arms then change the subset.
Possible solution/implementation details
One possible thing is using a hierarchy of bandits. At the first level, I could have arm1-arm200, arm200-arm-400... At the second level each of the arms, for example, arm1-arm200 might be like arm1, arm2 ... arm200. As I need to use a contextual bandit, I don't know if this will work efficiently.
Example/links if any