Closed deklanw closed 3 years ago
I print the prediction scores. It seems all the scores are 0, which causes the implausible metrics.
To speed up the full sort prediction, we put the ground truth items at the beginning of the list to be sorted and it is a stable sort. So if all the candidate items have the same score, it will achieve high performance.
I must admit that the current sorting evaluation method is not very good.
@ShanleiMu Ah! I see. That is tricky.
The output of all 0s is I, I believe, caused by the hyperparameter controlling the L1 coefficient in the regression being too large. I was hoping to use HyperOpt to determine what reasonable values for that coefficient are (it's probably problem-specific), but with the evaluation working like this, I don't see how I could make the determination!
Is there a way around this?
I print the prediction scores. It seems all the scores are 0, which causes the implausible metrics.
To speed up the full sort prediction, we put the ground truth items at the beginning of the list to be sorted and it is a stable sort. So if all the candidate items have the same score, it will achieve high performance.
I must admit that the current sorting evaluation method is not very good.
Thanks for Shan Lei's quick reply. I'd like to make some supplementary explanations. In fact, none of the TopK metrics we implement can handle the items which have the same score (GAUC is an exception, because GAUC uses the average rank as a solution). Frankly speaking, for deep learning recommendation algorithms, in my understanding, items have the same score is a very low probability event. Most open source recommendation codes, such as recommenders, NeuRec do not deal with this situation. But for the non deep learning algorithms, it may be common that some items have the same score, especially in the early stage of the training.
The sort trick we use will cause positive items always appear at the top of the items which have the same score. Because we have a re-organizing
stage to speed up the evaluation. For more information about this trick, you can click evaluation.
Maybe we should keep the randomness of positive items‘ position when we have the same score. We will discuss this problem and try to come up with a feasible solution to alleviate this problem!
Hi! @deklanw. I think randomly generating a small range of numbers and adding them to the score may solve this problem. Because it can avoid some items having the same score. If the model does not learn any information, the result will be very poor. If the model learns some obvious information, the randomly generated small number will not have a great impact on the result, which may help you to determine the range of the L1 parameter.
@tsotfsk Thanks, that all makes sense.
The adding random noise idea worked well. But, it can't be a temporary solution because I believe the appropriate L1 hyperparameter depends on the problem. I've chosen hyperparameter defaults which work well on ml-100k
but that's as far as I can tell.
def add_noise(t, mag=1e-5):
return t + mag * torch.rand(t.shape)
...
def predict(self, interaction):
user = interaction[self.USER_ID].cpu().numpy()
item = interaction[self.ITEM_ID].cpu().numpy()
r = torch.from_numpy((self.interaction_matrix[user, :].multiply(
self.item_similarity[:, item].T)).sum(axis=1).getA1())
return add_noise(r)
def full_sort_predict(self, interaction):
user = interaction[self.USER_ID].cpu().numpy()
r = self.interaction_matrix[user, :] @ self.item_similarity
r = torch.from_numpy(r.todense().getA1())
return add_noise(r)
I'm fine with leaving this in permanently.
Seem fine?
Hi! @deklanw. :blush: your code looks fine, and I tested the HyperOpt module after adding noise and it also works well. The phenomenon of implausible metrics will disappear and the effect of noise on the results is very small and can be ignored.
Thanks for the help
Trying out my implementation of SLIM with ElasticNet https://github.com/RUCAIBox/RecBole/pull/621 I'm noticing some implausible numbers. Dataset is
ml-100k
with all defaults. Using default hyperparameters of my method defined in its yaml file (not yet well-chosen because these results are so off) https://github.com/RUCAIBox/RecBole/blob/41a06e59ab26482dbfac641caac99876c167168c/recbole/properties/model/SLIMElastic.yamlUsing this standard copy-pasted code
Results:
INFO test result: {'recall@10': 0.8461, 'mrr@10': 0.5374, 'ndcg@10': 0.7102, 'hit@10': 1.0, 'precision@10': 0.6309}
Also, my HyperOpt log is highly suspicious
Exact same results with different parameters?
I figure if there is a mistake in my implementation it would cause bad performance, not amazing performance.
Anyone know what could be causing this?