benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Popularity evaluation function #568

Open ajster1 opened 2 years ago

ajster1 commented 2 years ago

It would be nice to have a popularity evaluation function to use as a baseline. I created one by modifying the ranking_metrics_at_k function by plugging in the K most popular. I'm not sure if this is accurate or even valid. Can somebody comment? It would really be amazing to have something like this in the library.

def popularity_ranking_metrics_at_k(urls_popularity, test_user_items, K=10, show_progress=True, num_threads=1):
    """ Calculates ranking metrics for a given trained model
    Parameters
    ----------
    urls_popularity : list
        All URLs sorted by most popular
    test_user_items : csr_matrix
        Sparse matrix of user by item that contains withheld elements to
        test on
    K : int
        Number of items to test on
    show_progress : bool, optional
        Whether to show a progress bar
    num_threads : int, optional
        The number of threads to use for testing. Specifying 0 means to default
        to the number of cores on the machine. Note: aside from the ALS and BPR
        models, setting this to more than 1 will likely hurt performance rather than
        help.

    Returns
    -------
    float
        calculated metrics
    """

    if not isinstance(test_user_items, csr_matrix):
        test_user_items = test_user_items.tocsr()

    users = test_user_items.shape[0]
    items = test_user_items.shape[1]

    # precision
    relevant = 0
    pr_div = 0
    total = 0
    # map
    mean_ap = 0
    ap = 0
    # ndcg
    cg = (1.0 / np.log2(np.arange(2, K + 2)))
    cg_sum = np.cumsum(cg)
    ndcg = 0
    # auc
    mean_auc = 0

    test_indptr = test_user_items.indptr
    test_indices = test_user_items.indices

    likes = set()
    batch_size = 1000
    start_idx = 0

    # get an array of userids that have at least one item in the test set
    to_generate = np.arange(users, dtype="int32")
    to_generate = to_generate[np.ediff1d(test_user_items.indptr) > 0]

    progress = tqdm(total=len(to_generate), disable=not show_progress)

    while start_idx < len(to_generate):
        batch = to_generate[start_idx: start_idx + batch_size]

        # Popularity modification start
        pop = list(zip(*urls_popularity[:K]))[0]
        ids_lst = []
        for u_idx in range(len(batch)):
            ids_lst.append(pop)
        ids = np.array(ids_lst)
        # Popularity Modification end

        start_idx += batch_size

        for batch_idx in range(len(batch)):
            u = batch[batch_idx]
            likes.clear()
            for i in range(test_indptr[u], test_indptr[u+1]):
                likes.add(test_indices[i])

            pr_div += np.fmin(K, len(likes))
            ap = 0
            hit = 0
            miss = 0
            auc = 0
            idcg = cg_sum[min(K, len(likes)) - 1]
            num_pos_items = len(likes)
            num_neg_items = items - num_pos_items

            for i in range(K):
                if ids[batch_idx, i] in likes:
                    relevant += 1
                    hit += 1
                    ap += hit / (i + 1)
                    ndcg += cg[i] / idcg
                else:
                    miss += 1
                    auc += hit
            auc += ((hit + num_pos_items) / 2.0) * (num_neg_items - miss)
            mean_ap += ap / np.fmin(K, len(likes))
            mean_auc += auc / (num_pos_items * num_neg_items)
            total += 1

        progress.update(len(batch))

    progress.close()
    return {
        "precision": relevant / pr_div,
        "map": mean_ap / total,
        "ndcg": ndcg / total,
        "auc": mean_auc / total
    }

# get popularity metrics
popularity_ranking_metrics_at_k(url_popularity, users_urls_test, K=5, num_threads=8)
chanansh commented 2 years ago

Yeah this is needed. I created a fake model which return top k so you don't need to change the evaluation function.

Surprisingly @benfred the popularity wins all collaborative filtering I have tried here.

ita9naiwa commented 2 years ago

I think it's a good idea.

I think it might be considered a new kind of recommender like PopularityBasedRecommender, and use it as other usual recommender instances here, and evaluate with ranking_metrics_at_k (prevents reinventing the wheel)

ita9naiwa commented 2 years ago

For now, you can do similarly with this (with your own popularity recommender and wrapper) https://github.com/benfred/implicit/issues/158#issuecomment-434517410