bounswe / bounswe2017group10

Atlas Project
http://174.129.53.155/
8 stars 10 forks source link

Features of cultural heritage items for recommendation system #295

Open SaitTalhaNisanci opened 6 years ago

SaitTalhaNisanci commented 6 years ago

Regardless of our model for recommendation system, we need to determine features of cultural heritage items so that we can determine their similarities etc. Here we can create a poll of any ideas about features. When you are giving features please give the reasoning as well so that we can also see if it is of any use. This thread will only be for features, which is basically the first step of our recommendation system, so please dont use this thread for anything else.

Item Content-Based Similarity

Features Reasoning
Location By using longitude/latitude we can find nearby items
Time By using time-spans (1 year, 50 years, 200 years, etc.), we can find items close in time
Title and description By using NLP algorithms we can extract some important words to be used as hidden tags
Tags Items with similar tags will probably be similar.

Item Popularity

Features Reasoning
Total view seconds Popular items are more likely to be viewed more
Total number of comments Popular items are more likely to get more comments
Total number of favourites Popular items are more likely to get starred
Creator of the item The same item can get more views created by different users
Location An item created in popular areas could get more views
Creation time Items should loose popularity as they get older
Completeness We may decrease rankings of items without content (no description, no location, etc.) to make better items more popular.

User Similarity

Item similarity will be the base upon which we will build user similarity model. Hence, having a solid item based similarity model is crucial to find similar users.

Features Reasoning
Which user visits which item Similar users are more likely to visit similar items for longer times
Which user comments which item Similar users are more likely to comment similar items
Which user favourites which item Similar users are more likely to favourite similar items
eozd commented 6 years ago

Proposed Edits

Location: We need to specify location more. We will use longitude and latitude information. Time: Time will be stored as intervals with unit time of one year. Total Views: Total views is going to be the cumulative sum of total view time. Since we will keep which user visits which item for how long, using this cumulative sum is a better information than total count.

SaitTalhaNisanci commented 6 years ago

@eozd That was what I meant by location, if that is not clear feel free to edit the table. You can also modify the others. They sound better. I meant Time and Total Views by others

eozd commented 6 years ago

@SaitTalhaNisanci Total number of views is related to item popularity, rather than item similarity. I have changed the tables accordingly.

SaitTalhaNisanci commented 6 years ago

@eozd I thought maybe popular items could also be considered similar.

eozd commented 6 years ago

@SaitTalhaNisanci We can use that kind of similarity in another model. If we mix content-based similarity with popularity-based similarity, our content-based similarity results may become really nonintuitive.

For example, if we would like to change our recommendations according to popularity (recommend popular items, or non-popular items), we can filter the results of the content-based model with popularity-based model. What do you think?

SaitTalhaNisanci commented 6 years ago

@eozd Yeah, that is true. I just thought lets say we have 4 items, A B C D. Lets assume that all of their features are different.Lets also A and B have the same popularity and let C and D have different popularity. In this case it sounds better to say that A and B are similar items in some sense compared to the similarity of C and D.

eozd commented 6 years ago

We may use creation-time for item popularity, i.e. items may loose popularity as they get older. One good example is reddit algorithm. We can change constants in this function, and define new variables to represent item popularity; because, items loose popularity in reddit in one, or at most two days, whereas we may want to keep popular items popular for a week, or so.

For rating sorting, see this blog post.

eozd commented 6 years ago

We may apply NLP algortihms to title, in addition to description to extract important words from an item, which will later be used as hidden tags.

eozd commented 6 years ago

We may incorporate negative bias towards items without good content. For example, if an items doesn't have a description, or any images, we may decrease its score by a set amount. This would make our popularity sorting algorithm biased towards more complete items.