fidelity / mabwiser

[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
https://fidelity.github.io/mabwiser/
Apache License 2.0
213 stars 42 forks source link

`context` isn't passed to `_parallel_fit` in Thompson Sampling #67

Closed clarkzjw closed 1 year ago

clarkzjw commented 1 year ago

Hi,

I noticed that in the fit function of _ThompsonSampling, contexts is never passed to self._parallel_fit(decisions, rewards). https://github.com/fidelity/mabwiser/blob/master/mabwiser/thompson.py#L38

    def fit(self, decisions: np.ndarray, rewards: np.ndarray, contexts: np.ndarray = None) -> NoReturn:

        # If rewards are non binary, convert them
        rewards = self._get_binary_rewards(decisions, rewards)

        # Reset the success and failure counters to 1 (beta distribution is undefined for 0)
        reset(self.arm_to_success_count, 1)
        reset(self.arm_to_fail_count, 1)

        # Reset warm started arms
        self._reset_arm_to_status()

        # Calculate fit
        self._parallel_fit(decisions, rewards)

        # Update trained arms
        self._set_arms_as_trained(decisions=decisions, is_partial=False)

        # Leave the calculation of expectations to predict methods

I'm curious why this is the case. In simulator.py, ThompsonSampling appears in both contextual_mabs and context_free_mabs. https://github.com/fidelity/mabwiser/blob/master/examples/simulator.py#L30-L42

If _parallel_fit in _ThompsonSampling never receives the context, how does it solve the contextual bandits problem?

Thanks.

bkleyn commented 1 year ago

Hey @clarkzjw, thank you for your interest in the library!

The ThompsonSampling learning policy is a non-contextual policy, which is why no contexts are passed to the fit function you noted.

Notice in the simulator example in Lines 32-33 the ThompsonSampling learning policy is combined with the Radius neighborhood policy, which makes it a contextual multi-armed bandit. In the other case you mentioned (Line 40) ThompsonSampling is used as a non-contextual learning policy on its own (without a neighborhood policy).

skadio commented 1 year ago

@clarkzjw it is cool how you went deep into the source code --impressive!

As @bkleyn noted, TS on it is own is not contextual. If we add a neighborhood policy, like Radius, then it becomes contextual but non-parameteric. If we want a parametric contextual version, then you can use LinTS instead of TS, which learns a linear model from the contexts.

For fun, you can actually combine a Neighborhood policy with LinTS --which gets you a hybrid contextual bandit, that learns from context in parametrically and non-parametricly.

Hope this helps!

clarkzjw commented 1 year ago

@skadio @bkleyn

Thank you for your quick replies. I think that cleared my doubts for now.