deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
Apache License 2.0
99 stars 92 forks source link

ElasticSearch Retriever is not performing well #598

Closed Asma-droid closed 2 weeks ago

Asma-droid commented 5 months ago


i'am using ElasticSearch as DocumentStore. So, i am using elastic search retrieval as follows

        embedding_similarity_function: l2_norm
          hosts: http://elasticsearch:9200
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
      num_candidates: 10
      top_k: 10
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever

Although answer is out of the context, the retriever still return documents with high score. below is an example

{ "AnswerBuilder": { "answers": [ { "data": " The context provided does not contain information about Langchain.", "query": "WHat is langchain ?", "documents": [ { "id": "b0b39b5c34c63991019b566e34b1ccfb784cf96a461cebc3711611fd5d9b8b38", "content": "general-purpose speech toolkit. arXiv preprint\narXiv:2106.04624 .\nRebai, I., Benhamiche, S., Thompson, K., Sellami, Z.,\nLaine, D., and Lorr ´e, J.-P. (2020). Linto platform: A\nsmart open voice assistant for business environments.\nInProceedings of the 1st International Workshop on\nLanguage Technology Platforms , pages 89–95.\nRNNoise (2023). Github RNNoise.\nxiph/rnnoise.\nSpiller, T. R., Ben-Zion, Z., Korem, N., Harpaz-Rotem, I.,\nand Duek, O. (2023). Efficient and accurate transcrip-\ntion in mental health research-a tutorial on using whis-\nper ai for sound file transcription.Suznjevic, M. and Saldana, J. (2016). Delay limits for real-\ntime services. IETF draft .\nTrabelsi, A., Warichet, S., Aajaoun, Y ., and Soussilane, S.\n(2022). Evaluation of the efficiency of state-of-the-\nart speech recognition engines. Procedia Computer\nScience , 207:2242–2252.\nUnion, I. T. (2016). Mean opinion score interpretation and\nreporting. Standard, International Telecommunication\nUnion, Geneva, CH.\nValin, J.-M. (2018). A hybrid dsp/deep learning approach\nto real-time full-band speech enhancement. In 2018\nIEEE 20th international workshop on multimedia sig-\nnal processing (MMSP) , pages 1–5. IEEE.\nVaseghi, S. V . (2008). Advanced digital ", "dataframe": null, "blob": null, "meta": { "source": "default/ICAART24.pdf", "page": 7, "source_id": "74d29100e8daffb446d9d6e1c7185e096e3a51cf9332fc6c421cd9ca467648d6" }, "score": 0.67131597,

Best regards

DemirTonchev commented 5 months ago

Elastic search uses bm25 algorithm, why do think score of 0.67 is high?

Asma-droid commented 5 months ago

@DemirTonchev i am using ES embedding Retriever. For query matchs with retrieved documents i have as well score between 0.60 and 0.82. So for me if the query does not match with retrieved documents, scores should be very small.

DemirTonchev commented 5 months ago

So for me if the query does not match with retrieved documents, scores should be very small.

Score of 0.6 - 0.82 is usually (in my experience) negligibly small. What is the length of your corpus and average idf? Looking at the query "WHat is langchain ?" and seeing the output document I would expect the score is small, there is no "langchain" in the returned text. How many documents are in the corpus that contain at least one occurrence of "langchain"? Also I suspect that " " (white space) is in your ES Doc store, which is not ideal.

Asma-droid commented 5 months ago

@DemirTonchev in my documentstore i have just one document that talks about Vosk and Kaldi! There is no Occurance of langchain. I did this on purpose to see how the model behaves

When i ask a question about vosk, I have the good answer with score equals 0.67. Below is a screenshot


I remark that the score is between 0 and 1 .

So my conclusion is that when we ask a question out of context the retriever still return results with +- high score.

Can you please explain more the whitespace problem. I cannot got it.

anakin87 commented 5 months ago

Should be investigated.