Open RokoMijic opened 4 years ago
I have been using this code snippet myself: just calls nDCG from sklearn.metrics :
def get_ndcg(surprise_predictions, k_highest_scores=None):
"""
Calculates the ndcg (normalized discounted cumulative gain) from surprise predictions, using sklearn.metrics.ndcg_score and scipy.sparse
Parameters:
surprise_predictions (List of surprise.prediction_algorithms.predictions.Prediction): list of predictions
k_highest_scores (positive integer): Only consider the highest k scores in the ranking. If None, use all.
Returns:
float in [0., 1.]: The averaged NDCG scores over all recommendations
"""
from sklearn.metrics import ndcg_score
from scipy import sparse
uids = [p.uid for p in surprise_predictions ]
iids = [p.iid for p in surprise_predictions ]
r_uis = [p.r_ui for p in surprise_predictions ]
ests = [p.est for p in surprise_predictions ]
sparse_preds = sparse.coo_matrix(ests, (uids ,iids ))
sparse_vals = sparse.coo_matrix(r_uis, (uids ,iids ))
dense_preds = sparse_preds.toarray()
dense_vals = sparse_vals.toarray()
return ndcg_score(y_true= dense_vals , y_score= dense_preds, k=k_highest_scores)
我自己一直在使用此代码段:只是从sklearn.metrics调用nDCG:
def get_ndcg(surprise_predictions, k_highest_scores=None): """ Calculates the ndcg (normalized discounted cumulative gain) from surprise predictions, using sklearn.metrics.ndcg_score and scipy.sparse Parameters: surprise_predictions (List of surprise.prediction_algorithms.predictions.Prediction): list of predictions k_highest_scores (positive integer): Only consider the highest k scores in the ranking. If None, use all. Returns: float in [0., 1.]: The averaged NDCG scores over all recommendations """ from sklearn.metrics import ndcg_score from scipy import sparse uids = [p.uid for p in surprise_predictions ] iids = [p.iid for p in surprise_predictions ] r_uis = [p.r_ui for p in surprise_predictions ] ests = [p.est for p in surprise_predictions ] sparse_preds = sparse.coo_matrix(ests, (uids ,iids )) sparse_vals = sparse.coo_matrix(r_uis, (uids ,iids )) dense_preds = sparse_preds.toarray() dense_vals = sparse_vals.toarray() return ndcg_score(y_true= dense_vals , y_score= dense_preds, k=k_highest_scores)
sparse_matrix.toarray() consume a lot of memory for large datasets.o(╥﹏╥)o
sparse_matrix.toarray() consume a lot of memory for large datasets.o(╥﹏╥)o
Yes, this will blow up for large datasets. There's probably a better way to do this. The best thing to do may be to fix the sklearn ndcg_score so that it also works with sparse matrices.
Whenever I try to run this method from surprise predictions, I am having this problem. Can you help ?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-d1fb93872a82> in <module>
----> 1 ndcgScore = get_ndcg(predictions)
<ipython-input-32-e273c54062ce> in get_ndcg(surprise_predictions, k_highest_scores)
20 ests = [p.est for p in surprise_predictions ]
21
---> 22 sparse_preds = sparse.coo_matrix(ests, (uids ,iids ))
23 sparse_vals = sparse.coo_matrix(r_uis, (uids ,iids ))
24
~\anaconda3\envs\RecSys\lib\site-packages\scipy\sparse\coo.py in __init__(self, arg1, shape, dtype, copy)
183 self._shape = check_shape(M.shape)
184 if shape is not None:
--> 185 if check_shape(shape) != self._shape:
186 raise ValueError('inconsistent shapes: %s != %s' %
187 (shape, self._shape))
~\anaconda3\envs\RecSys\lib\site-packages\scipy\sparse\sputils.py in check_shape(args, current_shape)
288 new_shape = tuple(operator.index(arg) for arg in shape_iter)
289 else:
--> 290 new_shape = tuple(operator.index(arg) for arg in args)
291
292 if current_shape is None:
~\anaconda3\envs\RecSys\lib\site-packages\scipy\sparse\sputils.py in <genexpr>(.0)
288 new_shape = tuple(operator.index(arg) for arg in shape_iter)
289 else:
--> 290 new_shape = tuple(operator.index(arg) for arg in args)
291
292 if current_shape is None:
TypeError: 'list' object cannot be interpreted as an integer
Update : After updating method like this :
def get_ndcg(surprise_predictions, k_highest_scores=None):
"""
Calculates the ndcg (normalized discounted cumulative gain) from surprise predictions, using sklearn.metrics.ndcg_score and scipy.sparse
Parameters:
surprise_predictions (List of surprise.prediction_algorithms.predictions.Prediction): list of predictions
k_highest_scores (positive integer): Only consider the highest k scores in the ranking. If None, use all.
Returns:
float in [0., 1.]: The averaged NDCG scores over all recommendations
"""
from sklearn.metrics import ndcg_score
from scipy import sparse
uids = [int(p.uid) for p in surprise_predictions ]
iids = [int(p.iid) for p in surprise_predictions ]
r_uis = [p.r_ui for p in surprise_predictions ]
ests = [p.est for p in surprise_predictions ]
assert(len(uids) == len(iids) == len(r_uis) == len(ests) )
sparse_preds = sparse.coo_matrix( (ests, (uids , iids )) )
sparse_vals = sparse.coo_matrix( (r_uis, (uids , iids )) )
dense_preds = sparse_preds.toarray()
dense_vals = sparse_vals.toarray()
return ndcg_score(y_true= dense_vals , y_score= dense_preds, k=k_highest_scores)
It worked correctly. Thanks you for this method, helped me lot 👍
Description
Discounted Cumulative Gain is a vitally important metric for applications where we care more about items at the top of the ranking than at the bottom - which in practice occurs whenever you are trying to recommend items to users (as opposed to trying to find items that users will hate, or items that users will be ambivalent about). The vast majority of practical applications are only concerned with finding the best suggestions.
https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
All the metrics that Surprise offers suffer from the problem of not discounting accuracy for low-ranked items.
My understanding is that you are not accepting pull requests for new features, but this is such a big weakness that I thought I'd mention it. If anyone has any ideas about how to deal with this then please say something here.
Steps/Code to Reproduce
N/A
Expected Results
N/A
Actual Results
N/A
Versions
all