Closed psorianom closed 3 years ago
Well actually it is a little bit more complex than fix https://github.com/etalab-ia/piaf-ml/pull/22 proposed :)
We have the k_reader
parameter which is only used by the instantiation of the Transformers model.
And we have k_display
which is used to select the top k answers. But it was not being sent to the predict method so @Rob192 had to add a line to keep k_display results:
predicted_answers['answers'] = predicted_answers['answers'][:k_display]
We can improve this by passing k_display directly to https://github.com/etalab-ia/piaf-ml/blob/3da5818b2ad88b2f195479177fff0cd051349156/src/evaluation/utils/utils_eval.py#L143
predicted_answers_list = [pipeline.run(query=q, top_k_retriever=k_retriever) for q in questions]
In any case, I started this investigation bc I wanted to know what do each parameter affected. This is a short resume:
k_retriever
= number of docs retrieved from ES (e.g., 5)
k_reader_per_candidate
(old k_reader
) = number of answers to get from each document (e.g., k_retriever=5, k_reader=5 .:. 25 answers)
k_reader_total
(old k_display
) = number of answer to really use to evaluate from the previous set (e.g., 25 candidates, k_display = 5, we get the top 5 answers)
In a next PR I will rename these variables to make a little bit more sense and fix the passing of the argument to the run
func.
Hey @Rob192 ,
I found that we are using a pipeline in
retriever_reader_eval_squad.py
that it is not actually taking into account thek_reader
parameter :(Our pipeline created in
single_run()
:is not using
k_reader
because of a (I would say) bug in haystack's code. When thereader
node of the pipeline is run, it uses this function inhaystack/reader/base.py
:This function sets
top_k_reader
(ourk_reader
) asNone
, i.e., it never passes thek_reader
we set when instantiating the reader before:This is kinda bad. Will check for a solution :+1: