Build a recommender system

SebastinSanty commented 7 years ago

A recommender system, for suggesting songs to users. For this we would also need a login system for the users. We also need to decide which attributes we will be working upon (like genre etc.). @kaivalyar Can you give some insight on this and start with a basic model. We'll catch up :). Also suggest what all you would require for building such a system, we'll try to provide an API.

kaivalyar commented 7 years ago

typical algorithms for (oversimplistic) recommender systems might use KNNs. unfortunately, for all my ML talk and interest, i have little knowledge on this field. The concept of a KNN is simple though, and i can elaborate on the theory anytime. The problem with such a recommender system though, is that there is no way to tangibly measure the success of the system. How do we tell when a recommendation is good? how do we train the model? Any solutions to that can be provided via api?

kaivalyar commented 7 years ago

KNN basics:

figure out the parameters a recommender system would depend on (metadata about current and past viewership trends)
quantify said parameters
plot each possible song (to be played in the future via recommendation) along with them (the viewership metadata from the past) as points in a vector space
calculate some (K) Nearest Neighbours
display these as recommendations

mukkachaitanya commented 7 years ago

So for implementing this algo, we do need enough metadata for the songs the user has listened right?

kaivalyar commented 7 years ago

yes of course, we need some info per song. artist name, song genre, song tempo, song style, etc etc ...

coditva commented 7 years ago

Can we use Last.fs API or Spotify API for this? We can get related artists, songs etc from that...

0xRampey commented 7 years ago

@UtkarshMe That's a good idea. We can get good quality album artwork and music categories from the Spotify API, in case the music from DC doesn't have those. I'd suggest you add it to the feature-list in Projects.

krishnacharya commented 7 years ago

@kaivalyar I feel a music recommendation system should use a Collaborative filtering or some such unsupervised learning algorithm (we would then use Knn on this data). @mukkachaitanya The collaborative filtering would even allow for DC users playlists recommend songs to others. Check this http://www.holehouse.org/mlclass/16_Recommender_Systems.html

wazeerzulfikar commented 7 years ago

@kaivalyar Recommender Systems can be built without using song metadata, there are two approaches to this:

Connect similar user profiles using their likes. This would recommend music using another like-minded user. This can be implemented by the KNN algorithm basically working on profile similarity.
Build associations between music tracks based on every user's choices. Eg: Two songs can be labeled similarly when a majority of users have liked both tracks. This is a workaround for developing a feature for music tracks instead of using song metadata. This can also be implemented using KNN and termed Collaborative Filtering.

If API's are used, to ensure the recommender system works offline, there would be a need to load all the metadata into a local database (can use HDF5 for large volumes of data), and build the recommender system using that.

kaivalyar commented 7 years ago

@wazeerzulfikar if we have to choose between metadata and user tracking - I would prefer the former.

wazeerzulfikar commented 7 years ago

How about collaborative filtering, as it neither uses individual user tracking nor uses metadata? As in the particular user details are not needed for recommending to the user.

kaivalyar commented 7 years ago

That is one option. Even tracking users isn't off the table yet, just to clarify.

Also, do you thinks we'll reach volumes so high as to require HDF5? I doubt that. ~500 concurrent users is a good estimate to work with, accessing songs that all fit into 200 GB. Metadata wouldn't exceed a few MB - normal file operations should be good enough I suppose.

wazeerzulfikar commented 7 years ago

That's true, I don't think we will be needing HDF5. I was just putting an upper cap. Direct storage of metadata might suffice.

kaivalyar commented 7 years ago

@wazeerzulfikar and I have been discussing this extensively, and seem to think that metadata might be too complicated, and less useful as compared to user plays (collaborative filtering). We should use those instead. However, we first need some way of integrating a (python based?) recommender system with the NodeJS backend of Encore.

kaivalyar commented 7 years ago

@wazeerzulfikar Have a look here. We may not need to build this up from scratch.

OSDLabs / Encore

Build a recommender system #4