etalab-ia / piaf-ml

PIAF v2.0 repo for ML development. Main purpose of this repo is to automatically find the best configuration for a QA pipeline of a partner organisation.
MIT License
8 stars 0 forks source link

computing useless embeddings in bm25 #9

Closed psorianom closed 3 years ago

psorianom commented 3 years ago

Hi @Rob192 ,

Why do we need to compute the embeddings even if we are using bm25 ? Is there a reason for that ? Could we do it only if we are actually using sbert (or dpr) ?

https://github.com/etalab-ia/piaf-ml/blob/ffdd609deaac616787f93943fbde12234e809c72/src/evaluation/retriever/retriever_eval_squad.py#L76

Rob192 commented 3 years ago

We could indeed add a condition if retriever_type != 'bm25': update_embeddings' ! I just thought it would somehow simplify the mappings and the declaration of the document_store to have an embedding by default. But I did not try to declare an embedding field in the mapping + document_store and make retrievals while leaving the embedding fields empty ...