Implement a testing framework

We need to automate our testing, which is currently happening through google sheets.

An admin can create a TestSuite, and TestQuestions under it. The TestSuite can be configured to have different temperature and topk

A TestQuestion will contain a question that will be asked to Ayushma, and an answer that will have a human entered answer for the question. We need to test if Ayushma's response is similar to the answer and how similar.

Once the admin triggers a test run, a new TestRun instance will be created linked to a Project and the TestSuite. The suite will run async through celery. The test will perform each associated TestQuestion and create a TestResult which will have the question and human_answer from the TestQuestion (do not link the models, because the questions can change), answer that was returned by Ayushma, the timetaken, cosine_sim and bleu_score

Once all questions have been answered, we need to calculate the cosign sim and bleu score for the result on the scale of 0 to 1. After this, the test will be over and the admin can see the results.

Now, update the user model to have an is_reviewer field. If a user has is_reviewer to be true, they can access the TestRuns and their TestResults and add feedback to the results. They will be able to create a new Feedback for a TestResult by entering a rating (Excellent, Good, Satisfactory, Unsatisfactory, Wrong or Hallucinating) and a note.

The Reviewer can only see their own feedback. They cannot edit them later. Only the Admins can see the Feedbacks of all reveiwers.

In the end, your new models should look like this (Models will be extending the base model class)

`TestSuite`

name
temperature
topk

`TestQuestion`

test_suite (fk)
question
human_answer

`TestRun`

test_suite (fk)
project
complete (default false)

`TestResult`

test_run (fk)
test_question (fk)
question
human_answer
answer
cosine_sim
bleu_score

`Feedback`

test_result (fk)
rating (integer choice field)
notes

cc. @bodhish

coronasafe / ayushma