benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

neighbour methods return less recommendation then requested #557

Open chanansh opened 2 years ago

chanansh commented 2 years ago

see in utils.py how _batch_call returns fewer items than requested. Why is that and how can I force item methods to return all recommendations?

        # pad out to N items if we're returned fewer
        missing_items = N - len(batch_ids)
        if missing_items > 0:
            batch_ids = np.append(batch_ids, np.full(missing_items, -1))
            batch_scores = np.append(
                batch_scores, np.full(missing_items, -np.finfo(np.float32).max)
            )
benfred commented 2 years ago

For the item-item neighbour models, we're computing the top K most similar items for each item in the dataset. Because of sparsity, there might be fewer than K neighbours available - for instance if the item only has 2 users, and each user only has liked 2 items - then there will be only 4 neighbours for that item. In that case, we have to pad out to get the required number of items - and there isn't much we can do .

You can pre-process your dataset to filter out items/users without many interactions, which should reduce how often this happens.

chanansh commented 2 years ago

It's better to pad with a random leftovers. Otherwise for these items a random guess could be better than -1 which will show as zero hits.

On Tue, 12 Apr 2022, 2:57 Ben Frederickson, @.***> wrote:

For the item-item neighbour models, we're computing the top K most similar items for each item in the dataset. Because of sparsity, there might be fewer than K neighbours available - for instance if the item only has 2 users, and each user only has liked 2 items - then there will be only 4 neighbours for that item. In that case, we have to pad out to get the required number of items - and there isn't much we can do .

You can pre-process your dataset to filter out items/users without many interactions, which should reduce how often this happens.

— Reply to this email directly, view it on GitHub https://github.com/benfred/implicit/issues/557#issuecomment-1095708993, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNULTEJFSS2GWZPT2QXSLDVES35HANCNFSM5ST7QMVQ . You are receiving this because you authored the thread.Message ID: @.***>