Deploy similarity model in the http service (hashtag861)

To deploy a similarity model in the HTTP service that returns the most similar bugs given a bug, you need to address the following:

1) Where do we store the corpus? How big is it?

The corpus can be stored in a database or a cloud storage service (e.g., AWS S3, Google Cloud Storage).
The size of the corpus depends on the number of bugs and the amount of data per bug. This needs to be measured.

2) How do we add new bugs to the pre-existing corpus? We'll need to do it as new bugs are filed every day.

Implement a process that:
- Retrieves new bugs from the bug tracking system (e.g., Bugzilla).
- Processes and transforms the new bugs to the required format for the similarity model.
- Updates the corpus with the new bug data.
This process can be scheduled to run at regular intervals (e.g., daily) or triggered by an event (e.g., a new bug is filed).

Relevant code snippets for implementing the API call:

http_service/bugbug_http/app.py: Extend this Flask application to include a new API endpoint for the similarity model.
http_service/bugbug_http/models.py: Add a new function to classify bugs using the similarity model.
infra/data-pipeline.yml: Add a new task for training the similarity model and updating the corpus.

Pseudo code for the new API endpoint:

from flask import Flask, request, jsonify
from some_similarity_model import find_similar_bugs

app = Flask(__name__)

@app.route('/find_similar_bugs', methods=['POST'])
def find_similar_bugs_endpoint():
    data = request.get_json()
    bug_id = data.get('bug_id')
    similar_bugs = find_similar_bugs(bug_id)
    return jsonify(similar_bugs)

if __name__ == '__main__':
    app.run()

Pseudo code for updating the corpus with new bugs:

def update_corpus_with_new_bug(bug_data):
    # Transform bug_data to the format required by the similarity model
    transformed_bug = transform_bug_data(bug_data)
    # Update the corpus with the new bug
    corpus.update(transformed_bug)
    # Persist the updated corpus
    save_corpus(corpus)

def daily_corpus_update_job():
    new_bugs = get_new_bugs_since_last_update()
    for bug in new_bugs:
        update_corpus_with_new_bug(bug)

Note: Actual implementation details will depend on the specific technologies and architecture of the existing system.

Mayil-AI / bugbug-21dec23

Deploy similarity model in the http service (hashtag861) #26