Closed douglasmason closed 3 years ago
It is a hyperparameter that should be tuned. The formula from the reference is calculated with information that you don't have for an online policy like this, and which includes probabilistic terms. The formula that you are thinking about is not the same thing LinTS is doing (see LogisticTS).
Can you please elaborate? There are two formulas I mentioned and it’s not clear which you’re referring to. The first comes from residuals (empirical) and is standard statistics requiring the entire training dataset, and the second comes from the Agarwal paper. Are you saying the Agarwal algorithm cannot be implemented in an online fashion?
LinTS is based on the reference from Agrawal, and has a hyperparameter v_sq. It is meant for an online scenario with incremental updates. LogisticTS's formula comes from standard statistics (for a logistic model), and does not support incremental updates.
For Linear Thompson Sampling predictions, there's a value v_sq, which is used to multiply the covariance matrix of model parameters when sampling from the multivariate norm defined by it. In my research, it looks like v is supposed to be the sum of squared residuals of the output value divided by the number of training rows up to this point which comes up in many standard error of coefficient calculations. So how does someone determine this value without storing all the data and re-computing the prediction?
Meanwhile the Agarwal paper sets v to Rsqrt(9d*ln(T/delta) where d is the number of dimensions or features in the linear regression, R is... something... it looks like an absolute bound on the residuals, T is the time horizon for regret calculations so it can be replaced by the current number of training rows, and delta is a fungible parameter between 0 and 1 which determines the regret behavior (higher delta means less variance in predictions and a higher likelihood of exceeding the guaranteed regret bounds).
So... uh... this is really confusing, and how to do you set v_sq then? There's not much documentation in the code.