Closed clarkzjw closed 1 year ago
Hey @clarkzjw, thank you for your interest in the library!
The ThompsonSampling learning policy is a non-contextual policy, which is why no contexts are passed to the fit function you noted.
Notice in the simulator example in Lines 32-33 the ThompsonSampling learning policy is combined with the Radius neighborhood policy, which makes it a contextual multi-armed bandit. In the other case you mentioned (Line 40) ThompsonSampling is used as a non-contextual learning policy on its own (without a neighborhood policy).
@clarkzjw it is cool how you went deep into the source code --impressive!
As @bkleyn noted, TS on it is own is not contextual. If we add a neighborhood policy, like Radius, then it becomes contextual but non-parameteric. If we want a parametric contextual version, then you can use LinTS instead of TS, which learns a linear model from the contexts.
For fun, you can actually combine a Neighborhood policy with LinTS --which gets you a hybrid contextual bandit, that learns from context in parametrically and non-parametricly.
Hope this helps!
@skadio @bkleyn
Thank you for your quick replies. I think that cleared my doubts for now.
Hi,
I noticed that in the
fit
function of_ThompsonSampling
,contexts
is never passed toself._parallel_fit(decisions, rewards)
. https://github.com/fidelity/mabwiser/blob/master/mabwiser/thompson.py#L38I'm curious why this is the case. In
simulator.py
,ThompsonSampling
appears in bothcontextual_mabs
andcontext_free_mabs
. https://github.com/fidelity/mabwiser/blob/master/examples/simulator.py#L30-L42If
_parallel_fit
in_ThompsonSampling
never receives the context, how does it solve the contextual bandits problem?Thanks.