HKUST-KnowComp / FMG

KDD17_FMG
138 stars 55 forks source link

issues about rating value #10

Closed katherinelyx closed 6 years ago

katherinelyx commented 6 years ago

Dr. Zhao, sorry to disturb you. when i read your paper, you write the sentence " FMG ignore the rating values, so it remains unknown whether it can further decrease RMSE if we adopt a similar approach to incorporate rating values into HIN". but in your codes and data, the file "ratings.txt" has shows the rating value. so i am confused about the rating value. can you explain it ?

hzhaoaf commented 6 years ago

@katherinelyx In this work, we classify the rating values into two groups: positive and negative, according to simple heuristics. And when we construct the instance of meta-graph, we only consider two types of "U-B-U", i.e., U-pos-B-pos-U, U-neg-B-neg-U. It is based on an assumption that similar users share similar preferences, including likes and dislikes.

It's a very straightforward idea to process the ratings in a more fine-grained way, but it needs to be designed carefully. I am not sure whether it can further decrease the RMSE. You may try it if you are interested:-)

Abigale001 commented 3 years ago

Hi, how do you classify the rating values into two groups? For example, for amazon dataset, rating 3,4 5 is positive ones, and rating 1 and 2 is negative ones?

hzhaoaf commented 3 years ago

Hi, how do you classify the rating values into two groups? For example, for amazon dataset, rating 3,4 5 is positive ones, and rating 1 and 2 is negative ones?

For each user, the mean value of his/her ratings is computed. Then ratings greater than the mean value are regarded as positive, and ratings smaller than the mean value are regarded as negative.

Abigale001 commented 3 years ago

Thank you very much. I have another question about the words in the review data in Amazon. Do you use some stopword files to delete some stop words, or just use all words in reviews to feed into the LDA?

hzhaoaf commented 3 years ago

Thank you very much. I have another question about the words in the review data in Amazon. Do you use some stopword files to delete some stop words, or just use all words in reviews to feed into the LDA?

stopwords are removed, and stemming is also executed. Actually, I use NLTK to process the text before feeding into LDA.