Mayil-AI / bugbug-21dec23

Mozilla Public License 2.0
0 stars 0 forks source link

Build a WORKSFORME model (hashtag3760) #19

Open vikramsubramanian opened 7 months ago

vikramsubramanian commented 7 months ago

We could use bugs that have been closed as WORKSFORME in the past, input certain features into the model such as 'how much time has passed since the last comment?', 'Is there an open needinfo on the reporter?', 'severity', 'status flags', etc. If the model is accurate enough, we could create a page or a sheet for each team with a list of bugs, sorted by 'probability of being WORKSFORME'. This way, teams can be much quicker in closing a large number of bugs. )

mayil-ai[bot] commented 7 months ago

Implementation ideas

To implement the WORKSFORME model, follow these steps:

Here is a high-level pseudo-code outline for the WORKSFORMEModel:

from bugbug.model import BugModel
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction import DictVectorizer
import xgboost

class WORKSFORMEModel(BugModel):
    def __init__(self):
        # Define the features to be used by the model
        self.features = [
            bug_features.TimeSinceLastComment(),
            bug_features.OpenNeedinfo(),
            bug_features.Severity(),
            bug_features.StatusFlags(),
            # Add other relevant features
        ]

        # Define the pipeline
        self.pipeline = Pipeline([
            ('transformer', ColumnTransformer([
                ('vectorizer', DictVectorizer(), self.features),
            ])),
            ('classifier', xgboost.XGBClassifier()),
        ])

    def train(self, bugs):
        # Prepare the dataset
        X = self.extract_features(bugs)
        y = [1 if bug['resolution'] == 'WORKSFORME' else 0 for bug in bugs]

        # Split the dataset
        X_train, X_test, y_train, y_test = train_test_split(X, y)

        # Train the model
        self.pipeline.fit(X_train, y_train)

        # Evaluate the model
        score = self.pipeline.score(X_test, y_test)
        print(f'Model accuracy: {score}')

    def predict(self, bugs):
        # Predict the probability of being WORKSFORME
        X = self.extract_features(bugs)
        probabilities = self.pipeline.predict_proba(X)
        return probabilities

Remember to handle data preprocessing, feature selection, model evaluation, and hyperparameter tuning to improve the model's accuracy.

Code snippets to check

bugbug → models → fixtime.py 1. `Lines 1 - 21` This snippet is from a model file that could be adapted to predict the WORKSFORME status. It shows the structure of a model which could be a starting point for the new model. https://github.com/Mayil-AI/bugbug-21dec23/blob/0acd00da46afbb37bd047c0bce06ce7cfad21568/bugbug/models/fixtime.py#L1-L21
bugbug → bug_features.py 1. `Lines 299 - 684` This snippet contains features that are used in models to predict bug statuses. These features could be relevant for the new WORKSFORME model. https://github.com/Mayil-AI/bugbug-21dec23/blob/0acd00da46afbb37bd047c0bce06ce7cfad21568/bugbug/bug_features.py#L299-L684
bugbug → bugzilla.py 1. `Lines 439 - 531` This snippet shows how data is fetched and processed from Bugzilla, which could be useful for gathering the data needed to train the WORKSFORME model. https://github.com/Mayil-AI/bugbug-21dec23/blob/0acd00da46afbb37bd047c0bce06ce7cfad21568/bugbug/bugzilla.py#L439-L531