cdqa-suite / cdQA

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
https://cdqa-suite.github.io/cdQA-website/
Apache License 2.0
616 stars 191 forks source link

Getting wrong predicted answer! #319

Closed SandhyaSuryanarayana closed 4 years ago

SandhyaSuryanarayana commented 4 years ago

I'm using a customized CSV file (created using pdf_converter) and fined-tuned the model using a SQUAD like dataset(created using annotator) to build a QA model.

I'm getting wrong answers for most of the questions. I did read another similar issue that was raised here and I did try varying the retriever_score_weight of .predict() method. It does not seem to work. But the correct answer is in the top 5 predictions when I set n_predictions to 5.

This is the code that I'm executing.

image

image

image

Any help is really appreciated.

Thank & regards, Sandhya

nehabharambe commented 4 years ago

I am also facing a similar issue, when I am using custom domain data to fine tune the model, the accuracy decreasing. For fine tuning, I am using json file generated from the cdqa-annotator. Is annotated dataset messing with the model?

andrelmfarias commented 4 years ago

Hi, Could you please share the size of your datasets (i.e. number of question-answer pairs) Did you try to fine-tune the model trained on SQuAD 1.1 or did you use the pre-trained BERT model (with no fine-tune on SQuAD) ?

nehabharambe commented 4 years ago

I used the model trained on SQuAD 1.1 and for one paragraph, there were around 3-4 question answer pair. I have attached my json file sample_cdqa-v1.1-2.zip

nehabharambe commented 4 years ago

Awating your response. I would appreciate if you can help me with this issue.

andrelmfarias commented 4 years ago

I think your training / fine-tunning data is too small. There are only 10 paragraphs and 38 QA pairs. If you compare with SQuAD, there are about 80k QA-pairs in the training set. The model is clearly over-fitting.