X-lab2017 / open-research

📚 用开源的方法来研究开源的现象。(open source methodology for open source phenomena)
51 stars 18 forks source link

[11-03] 组会内容介绍:基于知识感知的异构图学习方法的 issue-PR 链接预测研究 #305

Open andyhuang18 opened 2 weeks ago

andyhuang18 commented 2 weeks ago

Description

汇报人:黄温瑞

本次会议分享一篇发表在 《IEEE Transactions on Software Engineering》(CCF-A)的一篇论文。

论文链接:

Improving_Issue-PR_Link_Prediction_via_Knowledge-Aware_Heterogeneous_Graph_Learning.pdf

论文摘要:

Links between issues and pull requests (PRs) assist GitHub developers in tackling technical challenges, gaining development inspiration, and improving repository maintenance. In realistic repositories, these links are still insufficiently established. Aiming at this situation, existing works focus on issues and PRs themselves and employ text similarity with additional information like issue size to predict issue-PR links, yet their effectiveness is unsatisfactory. The limitation is that issues and PRs are not isolated on GitHub. Rather, they are related to multiple GitHub sources, including repositories and submitters, which, through their diverse relationships, can supply potential and crucial knowledge about technical domains, developmental insights, and cross-repository technical details. To this end, we propose Auto IP Linker (AIPL), which introduces the heterogeneous graph to model multiple GitHub sources with their relationships. Further, it leverages the metapath-based technique to reveal and incorporate the potential information for a more comprehensive understanding of issues and PRs. Firstly, we identify 4 types of GitHub sources related to issues and PRs (repositories, users, issues, PRs) as well as their relationships, and model them into task-specific heterogeneous graphs. Next, we analyze information transmitted among issues or PRs to reveal which knowledge is crucial for them. Based on our analysis, we formulate a series of metapaths and employ the metapath-based technique to incorporate various information for learning the knowledgeaware embedding of issues and PRs. Finally, we can infer whether an issue and a PR can be linked based on their embedding. We evaluate the performance of AIPL on real-world data sets collected from GitHub. The results show that, compared to the baselines, AIPL can achieve average improvements of 15.94%, 15.19%, 20.52%, and 18.50% in terms of Accuracy, Precision, Recall, and F1-score.

AIPL框架概览:

image

AIPL性能展示:

image

相关论文:

birdflyi commented 2 weeks ago

相关工作:https://github.com/birdflyi/GitHub_Collaboration_Relation_Extraction

Feature: