Open vikramsubramanian opened 7 months ago
To implement the WORKSFORME model, follow these steps:
Define the features to be used by the model based on the ticket description:
Extract the relevant features from the bug data:
bugbug/bug_features.py
to define new features if needed.SingleBugFeature
to extract the required information.Prepare the dataset:
bugbug/bugzilla.py
to fetch bugs that have been closed as WORKSFORME.Train the model:
bugbug/models/fixtime.py
as a reference to create a new model class, e.g., WORKSFORMEModel
, inheriting from BugModel
.sklearn.pipeline.Pipeline
with a ColumnTransformer
and DictVectorizer
to handle categorical features.xgboost.XGBClassifier
) and add it to the pipeline.Implement the prediction functionality:
WORKSFORMEModel
class to predict the probability of a bug being WORKSFORME.Create a page or sheet for each team:
ui/changes/src/common.js
to create a new UI component or page that lists bugs with their probability of being WORKSFORME.Integrate the model into the workflow:
Here is a high-level pseudo-code outline for the WORKSFORMEModel
:
from bugbug.model import BugModel
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction import DictVectorizer
import xgboost
class WORKSFORMEModel(BugModel):
def __init__(self):
# Define the features to be used by the model
self.features = [
bug_features.TimeSinceLastComment(),
bug_features.OpenNeedinfo(),
bug_features.Severity(),
bug_features.StatusFlags(),
# Add other relevant features
]
# Define the pipeline
self.pipeline = Pipeline([
('transformer', ColumnTransformer([
('vectorizer', DictVectorizer(), self.features),
])),
('classifier', xgboost.XGBClassifier()),
])
def train(self, bugs):
# Prepare the dataset
X = self.extract_features(bugs)
y = [1 if bug['resolution'] == 'WORKSFORME' else 0 for bug in bugs]
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Train the model
self.pipeline.fit(X_train, y_train)
# Evaluate the model
score = self.pipeline.score(X_test, y_test)
print(f'Model accuracy: {score}')
def predict(self, bugs):
# Predict the probability of being WORKSFORME
X = self.extract_features(bugs)
probabilities = self.pipeline.predict_proba(X)
return probabilities
Remember to handle data preprocessing, feature selection, model evaluation, and hyperparameter tuning to improve the model's accuracy.
We could use bugs that have been closed as WORKSFORME in the past, input certain features into the model such as 'how much time has passed since the last comment?', 'Is there an open needinfo on the reporter?', 'severity', 'status flags', etc. If the model is accurate enough, we could create a page or a sheet for each team with a list of bugs, sorted by 'probability of being WORKSFORME'. This way, teams can be much quicker in closing a large number of bugs. )