Closed ngoel17 closed 3 years ago
Hi @ngoel17 , could you tell me a bit more about you use case? I'm assuming that you want to use the DPR Retriever model to retrieve relevant documents and then use a Reader model to highlight the answer. If so, you will want to set top_k_reader=5
or top_k_reader=20
. Where this param is depends on which object / method you are using. Here are a few examples
Finder.eval(top_k_reader=20)
Finder.get_answers(top_k_reader=20)
ExtractiveQAPipeline.run(top_k_reader=20)
Reader.predict(top_k=20)
Is this what you were looking for?
Hi Branden, I am actually interested in verifying the DPR top-n accuracies on NQ. So here is the code I had. But It doesn't work.
def evaluate_question_answering(): ########################## ########## Settings ########################## device, n_gpu = initialize_device_settings(use_cuda=True)
question_lang_model = "facebook/dpr-question_encoder-single-nq-base"
passage_lang_model = "facebook/dpr-ctx_encoder-single-nq-base"
do_lower_case = True
use_fast = True
embed_title = True
similarity_function = "dot_product"
num_hard_negatives = 0
data_dir = Path("../data/retriever")
evaluation_filename = "nq-dev.json"
train_filename = "nq-train.json"
dev_filename = "nq-dev.json"
test_filename = "nq-dev.json"
batch_size = 50
no_ans_boost = 0
accuracy_at = 20 # accuracy at n is useful for answers inside long
documents
# 1.Create a tokenizer
query_tokenizer =
Tokenizer.load(pretrained_model_name_or_path=question_lang_model, do_lower_case=do_lower_case, use_fast=use_fast) passage_tokenizer = Tokenizer.load(pretrained_model_name_or_path=passage_lang_model, do_lower_case=do_lower_case, use_fast=use_fast)
# 2. Create a DataProcessor that handles all the conversion from raw
text into a pytorch Dataset label_list = ["hard_negative", "positive"]
metric = "top_n_accuracy"
max_samples=None
processor = TextSimilarityProcessor(query_tokenizer=query_tokenizer,
passage_tokenizer=passage_tokenizer,
max_seq_len_query=64,
max_seq_len_passage=256,
label_list=label_list,
metric=metric,
data_dir=data_dir,
train_filename=train_filename,
dev_filename=dev_filename,
test_filename=test_filename,
embed_title=embed_title,
num_hard_negatives=num_hard_negatives,
max_samples=max_samples)
# 3. Create a DataSilo that loads dataset, provides DataLoaders for
them and calculates a few descriptive statistics of our datasets data_silo = DataSilo(processor=processor,batch_size=batch_size)
# 4. Create an Evaluator
evaluator = Evaluator(
data_loader=data_silo.get_data_loader("test"),
tasks=data_silo.processor.tasks,
device=device
)
LanguageModel.load(pretrained_model_name_or_path="../saved_models/dpr-facebook/ lm1",language_model_class="DPRQuestionEncoder")
LanguageModel.load(pretrained_model_name_or_path="../saved_models/dpr-facebook/l m2", language_model_class="DPRContextEncoder") question_language_model = LanguageModel.load(pretrained_model_name_or_path="facebook/dpr-question_encoder- single-nq-base",language_model_class="DPRQuestionEncoder") passage_language_model = LanguageModel.load(pretrained_model_name_or_path="facebook/dpr-ctx_encoder-single -nq-base", language_model_class="DPRContextEncoder")
# 5. Load model
similarity_function = "dot_product"
prediction_head =
TextSimilarityHead(similarity_function=similarity_function,n_best=5) model = BiAdaptiveModel( language_model1=question_language_model, language_model2=passage_language_model, prediction_heads=[prediction_head], embeds_dropout_prob=0.1, lm1_output_types=["per_sequence"], lm2_output_types=["per_sequence"], device=device)
device=device, task_type="question_answering" )
#model = AdaptiveModel.load(lang_model, device=device)
model.prediction_heads[0].no_ans_boost = no_ans_boost
model.prediction_heads[0].n_best = accuracy_at
model.connect_heads_with_processor(data_silo.processor.tasks,
require_labels=True)
# 6. Run the Evaluator
results = evaluator.eval(model)
print(results[0])
tnacc = results[0]["top_n_accuracy"]
print(f"top_{accuracy_at}_accuracy:", tnacc)
On Fri, Feb 19, 2021 at 5:20 AM Branden Chan notifications@github.com wrote:
Hi @ngoel17 https://github.com/ngoel17 , could you tell me a bit more about you use case? I'm assuming that you want to use the DPR Retriever model to retrieve relevant documents and then use a Reader model to highlight the answer. If so, you will want to set top_k_reader=5 or top_k_reader=20. Where this param is depends on which object / method you are using. Here are a few examples
Finder.eval(top_k_reader=20) Finder.get_answers(top_k_reader=20) ExtractiveQAPipeline.run(top_k_reader=20) Reader.predict(top_k=20)
Is this what you were looking for?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepset-ai/haystack/issues/847#issuecomment-781980931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDHE6DZH7KKC5IUDNJ5WHTS7Y3NJANCNFSM4X3QXL7A .
Ah so I see you are using a lot of code from the FARM repository to work with the DPR model. A lot of that code is now encapsulated in the Haystack Classes and I would recommend using that instead. For example, if you're only interested in the Retrieval step, without the Reader (i.e. QA model), you could do everything using the DensePassageRetriever
Class. This class also has a DensePassageRetriever.eval()
method that will let you eval its performance.
Have a look at our Tutorial 5 which helps you through evaluating models in Haystack. It can perform Retriever evaluation, Reader evaluation or full pipeline evaluation. Let me know if that helps. Would be happy to guide you through any confusing parts!
Seems to be solved for now so am closing this. Feel free to reopen if you have a follow up Q
Question I want to retrieve the facebook DPR Natural Questions Model from huggingface, and then calculate the top_n_accuracy for N=5 and N=20. However I am unable to set the metric correctly. Can someone please provide a template?