fidelity / mabwiser

[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
https://fidelity.github.io/mabwiser/
Apache License 2.0
213 stars 42 forks source link

Cascading feedback type #71

Closed mustfkeskin closed 1 year ago

mustfkeskin commented 1 year ago

Hello

In the cascading feedback type (the term coined by Craswell et al., 2008), we assume the user looks at the displayed items in a sequential manner, starting at the top slot. As soon as the user finds an item worthy of clicking, they click and never return to the current ranked list. They don't even look at items below the item clicked. Not clicking on any item is also a possibility, this happens when none of the displayed items are worthy of clicking. In this case, the user does look at all the items.

The feedback signal is composed of two elements: The index of the chosen element, and the value of the click. Then it is the agent's task to translate this information to scores. In our implementation in the bandit library, we implemented the convention that seen but unclicked items receive some low score (typically 0 or -1), the clicked item receives the click value, and the items beyond the clicked one are ignored by the agent.

Does this repo support cascade feedback mechanism?

bkleyn commented 1 year ago

The API is very flexible in that it allows any sequence of decisions (e.g., item IDs) and rewards (e.g., 0 if no click, 1 if click) in the fit and partial_fit methods.

It is not clear to me if/how the "index of the chosen item" can be incorporated. And somewhat related, what "scores" the bandit should return is not obvious. Would the bandit not predict the next item to recommend?

mustfkeskin commented 1 year ago

It would be the fastest to update the dataset myself As an alternative, there is the tf_agent library, but it was too complicated to use.

If you want to reach the value of the selected arm, you can use the predict_expectations method.

skadio commented 1 year ago

@mustfkeskin Agree on updating the data externally and then just passing decisions, rewards, context to mabwiser.

Same rationale on "ease of use" on why decided to build mabwiser in the first place. Glad to hear you feel the same way.

If you like our library, we would much appreciate a github star 🌟 to spread the word!