benfred / implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets
https://benfred.github.io/implicit/
MIT License
3.57k stars 612 forks source link

Feature Request: options when filter_already_liked_items is maxed out #476

Closed samtriolo closed 2 years ago

samtriolo commented 3 years ago

I have a recommender deployed for website content. Because the website is public, it sees quite a few non-human visitors, ie "bots". These bots have various goals, some legitimate, some nefarious. Regardless, because they visit large amounts of content on the website, they run the risk of eventually visiting all content, which triggers this exception in MatrixFactorizationBase.recommend_all().

In my case, this exception will only be triggered by bots (no human will visit eg 15,000+ unique pieces of content on our site), and fixes involve developer time and / or increased runtime (eg running a query to calculate, or keeping a running tally of, the count of unique pieces of content visited by all users, over all time).

I could avoid above if implicit were to provide alternative handling options. For example, implicit could provide the option to ignore these users entirely (eg insufficient_unliked_items='ignore'), or ignore them, but with a warning (eg insufficient_unliked_items='warn'), and otherwise continue execution, or a unique exception class I can use to setup my own custom handling, etc.

Thanks for the high quality library - it's appreciated.

benfred commented 3 years ago

This is one of the undocumented differences between recommend (which only handles a single user) and recommend_all : recommend will just return fewer results than requested and recommend_all will throw an exception in this case.

I'd like to eventually unify the api's so that there isn't a difference here in operation. I think returning fewer items instead of throwing an exception is probably a better default choice . I'm also not even convinced we need to add a paraemter here for this

samtriolo commented 3 years ago

@benfred makes sense. I could see how a parameter would be unnecessary, especially as long as users with eg zero available recommendations are still included in the returned np.ndarray, except with an empty recs list. IMO, not returning the user at all could lead to some confusion / lost time when a dev attempts to debug a problem ("why isn't user X receiving any recs?") that isn't actually a problem.

benfred commented 2 years ago

@samtriolo - I'm changing the API for recommend_all here https://github.com/benfred/implicit/issues/481 .

With this branch, when we've filtered out too many results and don't have any to show the scores returned will be -infinity for those items. This means there will be no exception thrown, but you might need to handle yourself if this is undesired.

samtriolo commented 2 years ago

Thx @benfred, this will improve the performance of my app (and likely others), while also simplifying the code, by not requiring to pre-filter all users based on previously liked items.

Your explanation of the feature above didn't immediately make sense to me (if filter_already_liked_items=True, then I would think there wouldn't be any items to return a score of negative infinity for), but perhaps the explanation will make more sense once I checkout all the other changes you made to the api (or perhaps you meant to say . . . will be -infinity for those *users* instead of "those items").

Regardless, assuming there is some feature to address it, looking forward to exploring it later. Thx again.