NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.34k stars 1.01k forks source link

Suggestion needed on recommendations made on own dataset? #37

Closed dilipbobby closed 7 years ago

dilipbobby commented 7 years ago

Hi,

I have seen the format of ratings data that you used in example codes. its is in following form : user id | item | ratings I too have same like of data but ratings are not given by the users.My data contains the userid | textid | score. Here score is something like index rate of the text. (Readability Score of text given by algorithm like Automated Readability Index).Now, I want apply same svd example code with my data.It is giving me the recommendations of text,based on scores but How can I know that I'm getting the correct recommendations based on scores ? My scores will be in between 1 to 100. am I doing correct thing by directly using the example code by replacing the the input data?

NicolasHug commented 7 years ago

Hi,

I am not sure I completely understand your problem.

If ratings are not given by the users, then why does your data look like userid textid score? As far as I understand ARI is user-independent, so it does not seem to make sense to involve users. Actually I'm not sure SVD is even relevant for your task, but again I'm not entirely sure of what you want to do.

Nicolas

dilipbobby commented 7 years ago

Hi, Let me explain you want I'm doing. I got text data with Ids and text content with some users. eg: user id |textid | text content 1 1 hi 1 2 hello man 2 3 good day

Now I want to recommend the posts to all users which they may like using surprise.But to make recommendations I need few number counts(like rating ie.,1-5), so I thought of finding the Readability Score of each text content using algorithms like ARI .Then I got data that looks like movie rating data.(only difference is ratings are given in between 1 to 5 & my scores are in between 1 to 100 ) user id |textid |score 1 1 2 and I applied svd on same data that looks like movie rating data which gave me recommendations to all the users.But How can I make sure that all the recommendations of those users are proper recommendations

Now, I hope you got an idea on what I'm saying

NicolasHug commented 7 years ago

So a line in your inital data user id |textid | text content 1 1 hi

simply means that user 1 has read item 1, which content is "hi"?

If that's the case, I think that what you are looking for is either implicit rating-based recommendation, or content-based recommendation (using tf-idf or something like that).

In the implicit rating framework, each line like 1 1 hi would be treated as a rating of 1, and any other pair (user, item) that is not in your database would be treated as a rating of 0.

Either way, Surprise is probably not the right tool for document recommendation. I will try to add algorithms that can deal with implicit feedback, but it will take a bit of time. You may be interested in other libraries like LightFM. As for content-based recommendation algorithms, I don't plan on implementing any.

Nicolas

dilipbobby commented 7 years ago

yeah ! Thank you, I will look at that LightFM also.

But I'm not using user id |textid | text content data for recommendations, I changed that data into the following format userid|textid|score by applying ARI algorithm. eg: userid|textid|score 1 1 50 1 2 23

it just looks like rating data (In place of rating I have a score column). I'm able to create a model using eg data. which give me the recommendations too but what my doubt is .. model which I created for recommendation is a right model or not?

(why I'm asking this means, when you looked at initial data (user id |textid | text content) you suggested me to go with content-based algorithms but I changed that initial data format to rating data.ie., userid|textid|score)

NicolasHug commented 7 years ago

To be perfectly clear no, I don't think that using ARI as a proxy for a rating is a promising idea, but hey the only way to know for sure is to try. The main issue I see is that there is no way to know if your predictions/recommendations are meaningful (which was your first question, that I can't answer).

This is why I suggested content-based recommendation or implicit-rating-based recommenders. I would suggest you to make a bit of research on document recommendation as well, since this is what you ultimately want to do.

Nicolas

dilipbobby commented 7 years ago

I will take all the points :) Thank you