[Presentation] Personalized project recommendation on GitHub

zhicheng-ning commented 1 year ago

Title

Personalized project recommendation on GitHub

Link

https://doi.org/10.1007/s11432-017-9419-x

Year

2018

Author and affiliation

$Xiaobing SUN^{1,2,5*}, Wenyuan XU^{1}, Xin XIA^{3}, Xiang CHEN^{4} \& Bin LI^{1}$

School of Information Engineering, Yangzhou University, Yangzhou 225007, China;
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;
Faculty of Information Technology, Monash University, Melbourne 3800, Australia;
School of Computer Science and Technology, Nantong University, Nantong 226019, China;
Information Technology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China

Conference or Journal

SCIS（SCIENCE CHINA Information Sciences）

Rank

A

Keywords

software recommendation
developer behavior
GitHub
user feedback
personalized recommendation

Selecting Reason

Give me a reference for my final graduation thesis

Supplementary

No response

bifenglin commented 1 year ago

问题追踪：论文如何验证评估结果？根据论文中说明这个评估过程的内容为：

we first selected 60% of the data as a training set and 40% as a test set. Then, we simulated feedback for the recommendations obtained using the test data to obtain new recommendations in consideration of the simulated feedback. We compared the accuracy of the second recommendation results to that of the first recommendations. Since positive feedback (a like) will not appear in the recommendation results again, positive feedback would interfere with our evaluation. Therefore, we only considered negative feedback. In other words, when the first recommendation (e.g., we recommend project i to user u) does not in the test data, we judge that user u dislikes project i. Typically, a user will not give feedback on all recommendations; thus, we only simulated feedback for 80% of the recommendation results.

这段内容说明的是，有两个推荐结果，一个是60%得出的推荐结果，一个是加上40%测试数据得出的新推荐结果，因为加上了40%的数据新推荐结果中用户的正反馈（推荐）不会再出现在推荐结果中，原因是40%数据中的用户的正反馈是已知的数据。因此他们只考虑负反馈。也就是两个推荐结果中用户都不喜欢的项目，则证明预测正确。

但是根据他的实验结果图和说明

The empirical results are shown in Table 3. The results show that the accuracy, recall, precision, and F1 values of the proposed approach are significantly higher than those of the UCF and ICF methods, both of which show poor results with the first three groups because the amount of data relative to user behavior and the number of projects is relatively small, which made the user-project matrix sparse. Thus, UCF cannot accurately find similar developers based on user behavior, and ICF cannot well use it to calculate similarity. With the proposed approach, we can calculate project similarity using descriptions and the source code, as well as user behavior. This greatly improves the accuracy of our recommendations. For example, in the top 10 recommendations of the Formidable group, the accuracy of the proposed approach reached 78.95%, and, in the worst case, accuracy was 68.29%. This means that greater than two-thirds of the users received at least one useful recommendation.

中最后一句话，说‘至少有一个用户收到1个有用的推荐’，说明结果图是以正反馈来做评测。。。所以跟我的理解有矛盾点。。。所以，还有其他的想法么？

zhicheng-ning commented 1 year ago

For example, in the top 10 recommendations of the Formidable group, the accuracy of the proposed approach reached 78.95%, and, in the worst case, accuracy was 68.29%. This means that greater than two-thirds of the users received at least one useful recommendation

Formidable 社区的前10个推荐中，文章提出的推荐方法的准确率应为 78.57%
在最坏的情况下，准确率为68.29% 。这意味着超过三分之二的用户收到了至少一条有用的建议

X-lab2017 / open-research