Dijital-Twin / model

0 stars 0 forks source link

feat: Finetune QA Model #1

Closed emirsoyturk closed 6 months ago

emirsoyturk commented 6 months ago

Goal

Since the desired result cannot be achieved with fine tuning of language models, the use of QA models will be tried. In this context, models like "roberta-base-squead2" etc. will be tested and the answer values ​​produced according to the question received will be combined with the MASK model to produce an output.(which is from https://github.com/Dijital-Twin/model/issues/2)

Steps

MGurcan commented 6 months ago

Commit Url

Enhancing QA Models

Objective

The primary goal is to explore the effectiveness of different QA models in generating accurate answers to questions. This exploration includes testing models like roberta-base-squad2 and others. Given that fine-tuning language models did not yield the desired outcomes, the focus will shift towards QA models. The responses from these models will be augmented using a MASK model to produce refined outputs.

Methodology

As a result of the research, it has been determined that fine tuning can be done on QA models with Haystack, and both the accuracy value can increase and the response times can decrease.

Fine Tuning "deepset/roberta-base-squad2" Code

from haystack.nodes import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
reader.train(data_dir=data_dir, train_filename="../data/squad_formatted_friendsqa_data.json", use_gpu=True, n_epochs=3, save_dir="my_model_3epoch")

It was concluded that data in SQuAD format is needed for the fine tuning process of QA models.

SQuAD Data Format

The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. [1]

Example format

{
"data": [
    {
        "paragraphs": [
            {
                "context": "The quick brown fox jumps over the lazy dog.",
                "qas": [
                    {
                        "question": "What does the fox jump over?",
                        "id": "q1",
                        "answers": [
                            {
                                "text": "the lazy dog",
                                "answer_start": 32
                            }
                        ]
                    }
                ]
            }
        ],
        "title": "Example"
    }
],
"version": "2.0"
}

The friendsQA dataset obtained from project emorynlp/FriendsQA was converted to SQuAD format and fine tune experiments were performed on the models.

During this converting sapha, Chinmay Bhalerao's Medium Article and Haystack[2] were used.

Below you can see the parameters and output times of some of the tried models.

The Models Used and Parameters-Response Times

Below you can find the analysis graphs of the outputs produced by the models for a friends quiz containing 142 questions using difflib.SequenceMatcher, and QA model's own scores. image image image image image

As a result of the examinations, it was observed that the fine tuning process reduced the response time and had a positive effect on the accuracy of the answers produced. Among the tried models, the best one is considered to be the deepset/roberta-base-squad2 model, which was trained for 3 epochs and produced output with 3 retrievers and 5 readers.