NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.37k stars 1.01k forks source link

feature request clustering vs prediction #396

Closed priamai closed 3 years ago

priamai commented 3 years ago

Hi there, I have started to use this library which is super useful! What would be the best way to provide a method to return the clusters for the users after the training phase? Using the movieset analogy I want to see how the users are distributed across clusters after factorization problem, for example there will be x users that like action movies and y that like comedy of course the cluster labelling will have to be made in a supervised fashion. I remember from a university course that there was a way to use the eigenvalues or vectors to assign a cluster id to each user, but I can't find the equation/explanation anymore. Any pointers much appreciated!

NicolasHug commented 3 years ago

Once the MF algorithm is trained you will have a pu attribute https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L118 with shape (n_users, n_factors), and it basically corresponds to each user projected into the dense "factors" space.

You can basically run any cluster algorithm from (e.g.) scikit-learn on that thing

priamai commented 3 years ago

Great that is what I was looking for!