Closed ewagner70 closed 6 years ago
The demo actually does this, so you could look at qa_system.py so see how. Basically you can use prediction. get_span_scores() for get a score for every possible span, and the post-process that as you want to get the top-k answers
Thank you Chris, I increased the Span size from 8 (as hard-coded in the run_on_user_documents.py) to numbers between 20 to 100 and it increased the answer span somewhat, but not for a full answer-list of up to 6 bullets (I even changed the text in the documents to avoid punctuation, line-feeds or similar). Is there some limitation with the pre-trained models (i found squad the best performing, by the way) or do I have change any other hyperparamater?
Second question: Any insights on how the confidence is calculated (reasoning: if confidence is too low, I would like to move on to a different AI algorithm and the confidence results I sometimes get back are quite high though the answer is not (fully) correct)
The Triviaqa data tends to only have short answers, while SQuAD has slightly longer answers, but not up to length 100. As a result all of these models have only been trained to extract short-answers, and its unlikely they will work very well if you want to extract long answers. The only sure-fire way to teach the model to extract longer answers spans would be to train it on a QA dataset with examples of longer answers.
The confidence score is a number optimized during training to ensure the correct answer span has a higher score then any other span within the document. It will not be perfect, especially if you are using the model on data that is very different then the kind the model was trained on.
Thank you very much for your first explanation - absolutely clear (I feel so stupid right now ...)
the confidence score explanation remains a bit vague ... can you potentially provide a more in depth explanation (provided that's not too much effort for you)?
I circumvent that by providing short (your result) and long answers (whole paragraph) and an ensemble with a Support Vector Machine and cosine similarity ;-) works quite ok, btw ...
I am glad to hear that its working. For the longer answer on the confidence numbers I can refer you to the paper:
https://arxiv.org/abs/1710.10723, in particular look at Section 3 which about how we designed the confidence scores the model uses.
whenever I ask a question which then shall produce a longer list of items as answer, it mostly/only takes the first item as answer. How can that be extended?