LaunchpadAI / space-bandits

GNU General Public License v3.0
102 stars 30 forks source link

Return the probabilities from Thompson sampling, not the max #13

Closed amoderate84 closed 5 years ago

amoderate84 commented 5 years ago

Many use-cases for bandits require optimizing a ux element such as a carousel or search results. I want to use something like ranked bandit - so a separate bandit for each slot in a carousel. However - to ensure that duplicate items are not displayed, if the first choice of an item is being displayed in slot 1, i need to remove it from the list for slot 2 and choose the next best one.

this is the paper I am using as a reference

AlliedToasters commented 5 years ago

Hello @amoderate84,

There are a couple of things you can try with the current release (I won't have time to build new features this week, PR's are encouraged).

If you use the hidden method model._sample(context), you will get the vector of all sampled actions as a numpy array.

You can try the model.predict_proba(context) method as well, which is appropriate only for binary outcomes. This should return an array in action space bounded by [0,1].

Hope this helps and I will either try to add new features or improve documentation in future releases.

AlliedToasters commented 5 years ago

Closing, please re-open if these solutions do not meet your requirements @amoderate84