Open martincousi opened 6 years ago
Given the differences between surprise and scikit-learn, I think the analoguous way of passing the weights into the fit
method would be for us to include it in the trainset
object.
But as far as I understand the weights are mostly used for computing evaluation metrics (which in turn can be used as an optimization criterion for training a model)?
So, sample weights could be passed by the user to the trainset
object or computed in the trainset
object according to one of many options. This option might need to be an option of cross_validate
?
Then, the algorithm would used these sample weights (if it can use them) in the fit
method. Should a boolean option use_sample_weight
be defined in the algorithm __init__
method?
I'm not completely clear on what the weights are for TBH.
Is it just for computing error / accuracy metrics? In this case the changes could be minimal and restrained to the accuracy
module.
The algorithms that are currently implemented do not support a sample weight (to my knowledge). So unless there are some new algorithms you want to implement that do require such a parameter, I don't really see the value of adding it.
This option might need to be an option of cross_validate?
Yes, and that might be tricky. As far as I understand it scikit-learn does not even have a way to deal with cross-validation and sample_weights
How hard would it be to modify the current algorithms to include a
sample_weight
option to thefit
method as insklearn
(e.g.,LinearRegression.fit
)? Is it just a matter of changing the update rule and can all algorithms handle such a parameter?By default, the sample weight would be 1 for all
trainset
observations and then different ways to compute these weights would be available (e.g., propensity score). For an example of why these weights are useful, see [1]. These weights can be computed only with the implicit matrix or with the available user/item features.[1] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims, “Recommendations as Treatments: Debiasing Learning and Evaluation,” in Proceedings of the 33rd International Conference on Machine Learning, 2016.