THUIR / ZhihuRec-Dataset

236 stars 21 forks source link

Questions about Recommendation with negative feedbacks #1

Closed hotchilipowder closed 3 years ago

hotchilipowder commented 3 years ago

Hello, thank you for offering this helpful dataset to the community!

I am working on negative feedbacks in the recommendation system. This paper offers some discussions about negative feedbacks in sec 4.4.2.

When the users interact with answers, they give positive and negative feedbacks to them. 
Positive feedbacks are like the users’ click, bookmark, and like the answers

The negative infomation is 答案被举报的次数, 答案被反对的次数, 答案收到的「没有帮助」的次数 and 该用户的回答收到的反对数量. But these feedbacks are not offered in the dataset if I am not missing something important.

As described in the readme.txt, it has the following files:

zhihu100M.txt (2.58G) --- user interactions
zhihu20M.txt (529M)--- user interactions
zhihu1M.txt (26.4M)--- user interactions
user_infos.txt (689M)--- the features of the users occured in the dataset
answer_infos.txt (1.32G)--- the features of the answers occured in the dataset
question_infos.txt (30.7M)--- the features of the questions occured in the dataset
author_infos.txt (11.2M)--- the features of the authors occured in the dataset
topic_infos.txt (6.11M)--- the features of the topics occured in the dataset

In zhihu1M.txt files: it includes:

The paper refers to [22]:

They use the: Goodbooks-10k (Goodbooks) and Movielens1M (ML-1M). (Both datasets
contain ratings on 1-5 scale by users on items).  

For each user, similar to [5], they calculated the mean rating and
treated ratings smaller than mean as negative and those equal or greater than the mean as positive).

But it not gives the positive/negatives info for the interactions.

Besides, are there any codes to run with RecBole as section 4.1 listed? It will be helpful to reproduce the results in Table 9.

sadream commented 3 years ago

Thanks for your valuable question!

  1. The interaction log between a user and an answer includes show time and read time. If the read time is 0, it means the user skipped the answer, and we consider this as negative feedback. As mentioned in the readme file, "If the user didn't read the answer, read time is set to 0." As mentioned in the paper, "Negative feedbacks are like the users delete and skip the answers. ".

  2. Please give me your email address, I will send the config files of recbole to your email.

hotchilipowder commented 3 years ago

Thanks for the clarification. It helps a lot! My email is h12345jack@gmail.com.

If the config files are not complex, it will be helpful for retrieving for others by pasting it in the issues.

sadream commented 3 years ago

Thanks for your advice. config files.zip general.yaml is used for Recbole. zhihu1M.inter is the dataset (suitable for recbole) If you need to learn more information such as model selection and parameter tuning, please watch this blog of Recbole: https://blog.csdn.net/Turinger_2000/article/details/110395198 And this is the website of Recbole: https://recbole.io/