This repo contains several projects that are all used together to operate the issue-labeler system.
src/PredictionEngine
is the class library that contains the ML.NET engine and its prediction logicsrc/ModelCreator
is a console app that produces ML.NET models for repositories to be used by the PredictionEngine
src/ModelTester
is a console app that can locally use the PredictionEngine
and show predictions on the consolesrc/ModelUploader
is a console app that uploads those models to Azure Blob Storage
src/PredictionService
is an ASP.NET app (Azure App Service) that hosts the ML.NET engine to make predictionssrc/ModelWarmup
is a console app that will "wake up" the ASP.NET app and force load the ML.NET models to prepare the app for processing predictionssrc/IssueLabelerService
is an ASP.NET app (Azure App Service) that responds to GitHub issue/PR webhooks
PredictionService
associated with the repo, receive the area label predictions/scores, and update the issue/PR accordinglyuntriaged
or creating comments on issuessrc/GitHubHelpers
is a class library containing helpers and wrappers around the GitHub APIs with methods specific to the issue-labeler system's needsThis repository contains the source code to train ML models for making label predictions, as well as the code for automatically applying issue labels onto issue/pull requests on GitHub repositories.
This issue-labeler uses ML.NET to help predict labels on GitHub issues and pull requests.
The dotnet organization contains repositories with many incoming issues and pull requests. In order to help with the triage process, issues get categorized with area labels. The issues related to each area get labeled with a specific area-
label, and then these label assignments get treated as learning data for an issue labeler to be built.
The following repositories triage their incoming issues semi-automatically, by manually selecting one of top 3 predictions received from a dotnet/issue-labeler:
The following repositories allow dotnet/issue-labeler to automatically set area-
labels for incoming issues and pull requests using GitHub Webhooks:
Of course with automatic labeling there is always a margin of error. But the good thing is that the labeler learns from mistakes so long as wrong label assignments get corrected manually.
For some repos, new issues get an untriaged
label, which then is expected to get removed by the area owner for the assigned area label as they go through their triage process. Once reviewed by the area owner, if they deem the automatic label as incorrect they may remove incorrect label and allow for correct one to get added manually.
Enabling the issue labeler for a repo entails these steps:
area-*
labels to a few hundred issues, which will be used as training dataAnd then periodically re-train the machine learning data so that the model can learn from a larger dataset of issues that have correct area labels applied. Note: the system does not learn in real time!
To get started, check out the docs for detailed steps to set up the issue labeler for your GitHub repo.
.NET is licensed under the MIT license.