Open SebastinSanty opened 7 years ago
typical algorithms for (oversimplistic) recommender systems might use KNNs. unfortunately, for all my ML talk and interest, i have little knowledge on this field. The concept of a KNN is simple though, and i can elaborate on the theory anytime. The problem with such a recommender system though, is that there is no way to tangibly measure the success of the system. How do we tell when a recommendation is good? how do we train the model? Any solutions to that can be provided via api?
KNN basics:
So for implementing this algo, we do need enough metadata for the songs the user has listened right?
yes of course, we need some info per song. artist name, song genre, song tempo, song style, etc etc ...
Can we use Last.fs API or Spotify API for this? We can get related artists, songs etc from that...
@UtkarshMe That's a good idea. We can get good quality album artwork and music categories from the Spotify API, in case the music from DC doesn't have those. I'd suggest you add it to the feature-list in Projects.
@kaivalyar I feel a music recommendation system should use a Collaborative filtering or some such unsupervised learning algorithm (we would then use Knn on this data). @mukkachaitanya The collaborative filtering would even allow for DC users playlists recommend songs to others. Check this http://www.holehouse.org/mlclass/16_Recommender_Systems.html
@kaivalyar Recommender Systems can be built without using song metadata, there are two approaches to this:
Connect similar user profiles using their likes. This would recommend music using another like-minded user. This can be implemented by the KNN algorithm basically working on profile similarity.
Build associations between music tracks based on every user's choices. Eg: Two songs can be labeled similarly when a majority of users have liked both tracks. This is a workaround for developing a feature for music tracks instead of using song metadata. This can also be implemented using KNN and termed Collaborative Filtering.
If API's are used, to ensure the recommender system works offline, there would be a need to load all the metadata into a local database (can use HDF5 for large volumes of data), and build the recommender system using that.
@wazeerzulfikar if we have to choose between metadata and user tracking - I would prefer the former.
How about collaborative filtering, as it neither uses individual user tracking nor uses metadata? As in the particular user details are not needed for recommending to the user.
That is one option. Even tracking users isn't off the table yet, just to clarify.
Also, do you thinks we'll reach volumes so high as to require HDF5? I doubt that. ~500 concurrent users is a good estimate to work with, accessing songs that all fit into 200 GB. Metadata wouldn't exceed a few MB - normal file operations should be good enough I suppose.
That's true, I don't think we will be needing HDF5. I was just putting an upper cap. Direct storage of metadata might suffice.
@wazeerzulfikar and I have been discussing this extensively, and seem to think that metadata might be too complicated, and less useful as compared to user plays (collaborative filtering). We should use those instead. However, we first need some way of integrating a (python based?) recommender system with the NodeJS backend of Encore.
A recommender system, for suggesting songs to users. For this we would also need a login system for the users. We also need to decide which attributes we will be working upon (like genre etc.). @kaivalyar Can you give some insight on this and start with a basic model. We'll catch up :). Also suggest what all you would require for building such a system, we'll try to provide an API.