deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.7k stars 1.92k forks source link

Computing top_n_accuracy for retreiver #847

Closed ngoel17 closed 3 years ago

ngoel17 commented 3 years ago

Question I want to retrieve the facebook DPR Natural Questions Model from huggingface, and then calculate the top_n_accuracy for N=5 and N=20. However I am unable to set the metric correctly. Can someone please provide a template?

brandenchan commented 3 years ago

Hi @ngoel17 , could you tell me a bit more about you use case? I'm assuming that you want to use the DPR Retriever model to retrieve relevant documents and then use a Reader model to highlight the answer. If so, you will want to set top_k_reader=5 or top_k_reader=20. Where this param is depends on which object / method you are using. Here are a few examples

Finder.eval(top_k_reader=20)
Finder.get_answers(top_k_reader=20)
ExtractiveQAPipeline.run(top_k_reader=20)
Reader.predict(top_k=20)

Is this what you were looking for?

ngoel17 commented 3 years ago

Hi Branden, I am actually interested in verifying the DPR top-n accuracies on NQ. So here is the code I had. But It doesn't work.

def evaluate_question_answering(): ########################## ########## Settings ########################## device, n_gpu = initialize_device_settings(use_cuda=True)

lang_model = "deepset/roberta-base-squad2"

do_lower_case = True

question_lang_model = "facebook/dpr-question_encoder-single-nq-base"
passage_lang_model = "facebook/dpr-ctx_encoder-single-nq-base"
do_lower_case = True
use_fast = True
embed_title = True
similarity_function = "dot_product"
num_hard_negatives = 0
data_dir = Path("../data/retriever")
evaluation_filename = "nq-dev.json"
train_filename = "nq-train.json"
dev_filename = "nq-dev.json"
test_filename = "nq-dev.json"

batch_size = 50
no_ans_boost = 0
accuracy_at = 20 # accuracy at n is useful for answers inside long

documents

# 1.Create a tokenizer
query_tokenizer =

Tokenizer.load(pretrained_model_name_or_path=question_lang_model, do_lower_case=do_lower_case, use_fast=use_fast) passage_tokenizer = Tokenizer.load(pretrained_model_name_or_path=passage_lang_model, do_lower_case=do_lower_case, use_fast=use_fast)

tokenizer = Tokenizer.load(

pretrained_model_name_or_path=lang_model,

do_lower_case=do_lower_case)

# 2. Create a DataProcessor that handles all the conversion from raw

text into a pytorch Dataset label_list = ["hard_negative", "positive"]

label_list = ["hard_negative_ctxs","nagative_ctxs" ,"positive_ctxs"]

metric = ["text_similarity_metric","top_n_accuracy"]

metric = "top_n_accuracy"
max_samples=None
processor = TextSimilarityProcessor(query_tokenizer=query_tokenizer,
                         passage_tokenizer=passage_tokenizer,
                         max_seq_len_query=64,
                         max_seq_len_passage=256,
                         label_list=label_list,
                         metric=metric,
                         data_dir=data_dir,
                         train_filename=train_filename,
                         dev_filename=dev_filename,
                         test_filename=test_filename,
                         embed_title=embed_title,
                         num_hard_negatives=num_hard_negatives,
                         max_samples=max_samples)

# 3. Create a DataSilo that loads dataset, provides DataLoaders for

them and calculates a few descriptive statistics of our datasets data_silo = DataSilo(processor=processor,batch_size=batch_size)

# 4. Create an Evaluator
evaluator = Evaluator(
    data_loader=data_silo.get_data_loader("test"),
    tasks=data_silo.processor.tasks,
    device=device
)

question_language_model =

LanguageModel.load(pretrained_model_name_or_path="../saved_models/dpr-facebook/ lm1",language_model_class="DPRQuestionEncoder")

passage_language_model =

LanguageModel.load(pretrained_model_name_or_path="../saved_models/dpr-facebook/l m2", language_model_class="DPRContextEncoder") question_language_model = LanguageModel.load(pretrained_model_name_or_path="facebook/dpr-question_encoder- single-nq-base",language_model_class="DPRQuestionEncoder") passage_language_model = LanguageModel.load(pretrained_model_name_or_path="facebook/dpr-ctx_encoder-single -nq-base", language_model_class="DPRContextEncoder")

# 5. Load model
similarity_function = "dot_product"
prediction_head =

TextSimilarityHead(similarity_function=similarity_function,n_best=5) model = BiAdaptiveModel( language_model1=question_language_model, language_model2=passage_language_model, prediction_heads=[prediction_head], embeds_dropout_prob=0.1, lm1_output_types=["per_sequence"], lm2_output_types=["per_sequence"], device=device)

model = AdaptiveModel.convert_from_transformers(lang_model,

device=device, task_type="question_answering" )

use "load" if you want to use a local model that was trained with FARM

#model = AdaptiveModel.load(lang_model, device=device)
model.prediction_heads[0].no_ans_boost = no_ans_boost
model.prediction_heads[0].n_best = accuracy_at
model.connect_heads_with_processor(data_silo.processor.tasks,

require_labels=True)

# 6. Run the Evaluator
results = evaluator.eval(model)
print(results[0])
tnacc = results[0]["top_n_accuracy"]
print(f"top_{accuracy_at}_accuracy:", tnacc)

On Fri, Feb 19, 2021 at 5:20 AM Branden Chan notifications@github.com wrote:

Hi @ngoel17 https://github.com/ngoel17 , could you tell me a bit more about you use case? I'm assuming that you want to use the DPR Retriever model to retrieve relevant documents and then use a Reader model to highlight the answer. If so, you will want to set top_k_reader=5 or top_k_reader=20. Where this param is depends on which object / method you are using. Here are a few examples

Finder.eval(top_k_reader=20) Finder.get_answers(top_k_reader=20) ExtractiveQAPipeline.run(top_k_reader=20) Reader.predict(top_k=20)

Is this what you were looking for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepset-ai/haystack/issues/847#issuecomment-781980931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDHE6DZH7KKC5IUDNJ5WHTS7Y3NJANCNFSM4X3QXL7A .

brandenchan commented 3 years ago

Ah so I see you are using a lot of code from the FARM repository to work with the DPR model. A lot of that code is now encapsulated in the Haystack Classes and I would recommend using that instead. For example, if you're only interested in the Retrieval step, without the Reader (i.e. QA model), you could do everything using the DensePassageRetriever Class. This class also has a DensePassageRetriever.eval() method that will let you eval its performance.

Have a look at our Tutorial 5 which helps you through evaluating models in Haystack. It can perform Retriever evaluation, Reader evaluation or full pipeline evaluation. Let me know if that helps. Would be happy to guide you through any confusing parts!

brandenchan commented 3 years ago

Seems to be solved for now so am closing this. Feel free to reopen if you have a follow up Q