google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
37.87k stars 9.56k forks source link

How to use run_squad.py to produce multiple answers for a question? #657

Open mushro00om opened 5 years ago

mushro00om commented 5 years ago

Hello, I am using run_squad.py to build my own question answering system, the problem is that, I want the system can output multiple answers for a question. The number of answers can be 0, or one, or multiple if possible, how can I do to the code to achieve this? Thank you

mushro00om commented 5 years ago

solved

kshitij12345 commented 5 years ago

@mushro00om can you please elaborate on how you did that?

mushro00om commented 5 years ago

@kshitij12345 Sure! Some questions in my dataset have multiple answers, some have one answer, some no answer.

Firstly, I add a for loop in the "read_squad_example" method to allow the code to read all answers for each question and build N SquadExamples for each question, N is the number of answers (This is for my case, you don't have to do it, because I need to use all answers, the original squad code only reads the first answer of each question even the question has multiple answers).

The run_squad.py produces a "nbest_predictions.json" file, you can see the model provides top 20 possible answers for each question, with possibilities, so I just simply pick some of those answers according to their possibilities.

However, I have to admit that eventually the performance isn't that good. it works but just not that good, but I think it can be improved by some way.

kshitij12345 commented 5 years ago

Oh I see, Thank You.

Got it. I was wondering whether you were changing the head of the model to predict multiple answers.

Thank You again.

kmandhaniya commented 4 years ago

@mushro00om Hey, could you please share the code for getting more than one answers using BERT. It'll be really helpful for my application.

mushro00om commented 4 years ago

Hi,

Hahaha actually I didn't do anything special, once you have finished training, there will be a file created in the path (sry I forgot the name of the file, I think it's a text file), the file involves answers of each question and their probabilities (BERT will select the answer with highest probability as THE answer), what I simply did is just set a threshold, and then extract answers if the probability is greater than the threshold (for example 30%) as multiple answers. Just a very simple solution, hope it helps.

cheers


From: kmandhaniya notifications@github.com Sent: Friday, 14 August 2020 4:07 To: google-research/bert bert@noreply.github.com Cc: mushro00om meet_linye@outlook.com; Mention mention@noreply.github.com Subject: Re: [google-research/bert] How to use run_squad.py to produce multiple answers for a question? (#657)

@mushro00omhttps://github.com/mushro00om Hey, could you please share the code for getting more than one answers using BERT. It'll be really helpful for my application.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/657#issuecomment-673684741, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALL77EYDM62CYKLXW46RWHTSARBYDANCNFSM4HOIRBPQ.

kmandhaniya commented 4 years ago

Oh, got it. Really helped me understand your logic. Thank you so much!