VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.48k stars 1.93k forks source link

Large number of arms #3421

Closed musram closed 2 years ago

musram commented 3 years ago

Short description

I have a requirement where I have around 10,000 arms, a contextual bandit. But during the exploration, I don't want to explore all the arms. For example, I will explore 1 to 200 arms for some time then 201 to 400 arms. Is there any way, I could include the set of arms in the command line?

How this suggestion will help you/others

This application pops up regularly, where you don't want to jump from arm 1 to arm x. Do exploration in a subset of arms then change the subset.

Possible solution/implementation details

One possible thing is using a hierarchy of bandits. At the first level, I could have arm1-arm200, arm200-arm-400... At the second level each of the arms, for example, arm1-arm200 might be like arm1, arm2 ... arm200. As I need to use a contextual bandit, I don't know if this will work efficiently.

Example/links if any

marco-rossi29 commented 3 years ago

HI, there is no need to use multiple bandits, you can use --cb_explore_adf (adf is action dependent features). This way you can specify which actions are available for each decision. The downside is that you need to actually pass the action as features string in each predict and learn call so it may be slow with such high number of action, depending on your latency requirement.

musram commented 3 years ago

Thanks marco-rossi29. I will check.