Closed psorianom closed 3 years ago
I understood your doubts are coming from the small drop in performance you observed on tiny.json
when you updated the tests as part of #86.
We should evaluate on a larger dataset which is the best model to use. Then make the appropriate change you proposed.
In addition we should update the tests for TitleQAPipeline
I just tested the performance of distilbert-base-multilingual-cased
(current model) vs distiluse-base-multilingual-cased
(previously used model) with the DILA dataset. It is quite a manual process to choose one or the other, basically I have to:
distiluse-base-multilingual-cased
),model_format
parameter (in the instantiation of EmbeddingRetriever
) to sentence_transformers
if using distiluse-base-multilingual-cased
. embedding_model
parameter to distiluse-base-multilingual-cased
. The results are quite surprising (quite an à l'arrache procedure) :
parameters = {
"k_retriever": [1],
"k_title_retriever" : [1], # must be present, but only used when retriever_type == title_bm25
"k_reader_per_candidate": [20],
"k_reader_total": [5],
"reader_model_version": ["053b085d851196110d7a83d8e0f077d0a18470be"],
"retriever_model_version": ["1a01b38498875d45f69b2a6721bf6fe87425da39"],
"retriever_type": ["sbert"], # Can be bm25, sbert, dpr, title or title_bm25
"squad_dataset": ["./clients/dila/knowledge_base/squad.json"],
"filter_level": [None],
"preprocessing": [False],
"boosting" : [1], #default to 1
"split_by": ["word"], # Can be "word", "sentence", or "passage"
"split_length": [1000],
"experiment_name": ["test"]
}
Reader Accuracy:0.29545454545454547 with distiluse-base-multilingual-cased (previously used)
Reader Accuracy:0.00909090909090909 with distilbert-base-multilingual-cased (currently used)
Based on these results (to further explore...) I suggest going back to the previously used model (distiluse) and check the sentence-transformers
library to see if they handle model versions and expose that on haystack. Also we can try out this model already in HF's hub but fine-tuned for Estonian :shrug: https://huggingface.co/kiri-ai/distiluse-base-multilingual-cased-et
(It is also quite horrible with this Estonian model: ~0.06 Acc)
I will check if we can version disltiluse on sentence-bert. If not, let's go back to sbert without versioning
@Rob192 @psorianom I noticed that the sbert retriever in file retriever_eval_squad.py
uses the distiluse and retriever_reader_eval_squad.py
uses distilbert. Should we keep one over the other or keep them different in each file?
Hi @guillaumecherel, I am on this. Should be fixed soon :) @guillim, this code is run at deployment, right? https://github.com/etalab-ia/piaf-ml/blob/854b09515f80bbf966ff756a3789fc91cc107929/deployment/roles/prepare_data/templates/insert_data.py#L114
yes, when we wan to inject the Knowledge base into ES
Is it possible to deploy a PR ? Cause I have to modify this file and I don't want to mess production :)
Is it possible to deploy a PR ? Cause I have to modify this file and I don't want to mess production :)
What do you mean deploy a PR
? like deploy in produciton from a branch ? that's possible yes
yes, I mean that. I believe this PR https://github.com/etalab-ia/piaf-ml/pull/149 is close to be merged but I guess it would be wise to check that the changes made to insert_data.py
do not break the deployment.
Sounds good to me. Tried for the DILA and works perfectly fine.
Problem description When doing the cache-like operation, the
EmbeddingRetriever
model was changed fromdistiluse-base-multilingual-cased
(the use in distiluse corresponds to Universal Sentence Embeddings) todistilbert-base-multilingual-cased
in order to be able to track the model version with HF model's hub. The problem is that this model seem less performant and more importantly it seems thatdistilbert
is no longer a sentence-bert model.Solution description We should either find a work-around to keep using
distiluse
language model, or keep going withdistilbert
, accept the little downgrade in perf and change the nomenclature used in our code.Who can help? @Rob192
Note
Distiluse
has 512-sized embeddings anddistilbert
has 768-sized embeddings. See https://github.com/UKPLab/sentence-transformers/issues/247 Also, I am actually not aware of the diff ofdistiluse
vsdistilbert
in terms of perf :/