Links between issues and pull requests (PRs) assist GitHub developers in tackling technical challenges, gaining development inspiration, and improving repository maintenance. In realistic repositories, these links are still insufficiently established. Aiming at this situation, existing works focus on issues and PRs themselves and employ text similarity with additional information like issue size to predict issue-PR links, yet their effectiveness is unsatisfactory. The limitation is that issues and PRs are not isolated on GitHub. Rather, they are related to multiple GitHub sources, including repositories and submitters, which, through their diverse relationships, can supply potential and crucial knowledge about technical domains, developmental insights, and cross-repository technical details. To this end, we propose Auto IP Linker (AIPL), which introduces the heterogeneous graph to model multiple GitHub sources with their relationships. Further, it leverages the metapath-based technique to reveal and incorporate the potential information for a more comprehensive understanding of issues and PRs. Firstly, we identify 4 types of GitHub sources related to issues and PRs (repositories, users, issues, PRs) as well as their relationships, and model them into task-specific heterogeneous graphs. Next, we analyze information transmitted among issues or PRs to reveal which knowledge is crucial for them. Based on our analysis, we formulate a series of metapaths and employ the metapath-based technique to incorporate various information for learning the knowledgeaware embedding of issues and PRs. Finally, we can infer whether an issue and a PR can be linked based on their embedding. We evaluate the performance of AIPL on real-world data sets collected from GitHub. The results show that, compared to the baselines, AIPL can achieve average improvements of 15.94%, 15.19%, 20.52%, and 18.50% in terms of Accuracy, Precision, Recall, and F1-score.
Description
汇报人:黄温瑞
本次会议分享一篇发表在 《IEEE Transactions on Software Engineering》(CCF-A)的一篇论文。
论文链接:
Improving_Issue-PR_Link_Prediction_via_Knowledge-Aware_Heterogeneous_Graph_Learning.pdf
论文摘要:
Links between issues and pull requests (PRs) assist GitHub developers in tackling technical challenges, gaining development inspiration, and improving repository maintenance. In realistic repositories, these links are still insufficiently established. Aiming at this situation, existing works focus on issues and PRs themselves and employ text similarity with additional information like issue size to predict issue-PR links, yet their effectiveness is unsatisfactory. The limitation is that issues and PRs are not isolated on GitHub. Rather, they are related to multiple GitHub sources, including repositories and submitters, which, through their diverse relationships, can supply potential and crucial knowledge about technical domains, developmental insights, and cross-repository technical details. To this end, we propose Auto IP Linker (AIPL), which introduces the heterogeneous graph to model multiple GitHub sources with their relationships. Further, it leverages the metapath-based technique to reveal and incorporate the potential information for a more comprehensive understanding of issues and PRs. Firstly, we identify 4 types of GitHub sources related to issues and PRs (repositories, users, issues, PRs) as well as their relationships, and model them into task-specific heterogeneous graphs. Next, we analyze information transmitted among issues or PRs to reveal which knowledge is crucial for them. Based on our analysis, we formulate a series of metapaths and employ the metapath-based technique to incorporate various information for learning the knowledgeaware embedding of issues and PRs. Finally, we can infer whether an issue and a PR can be linked based on their embedding. We evaluate the performance of AIPL on real-world data sets collected from GitHub. The results show that, compared to the baselines, AIPL can achieve average improvements of 15.94%, 15.19%, 20.52%, and 18.50% in terms of Accuracy, Precision, Recall, and F1-score.
AIPL框架概览:
AIPL性能展示:
相关论文: