deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
14.74k stars 1.72k forks source link

TransformersSimilarityRanker (transformers_similarity.py) runtime error #7521

Closed kristapsdz-saic closed 5 days ago

kristapsdz-saic commented 3 months ago

Describe the bug The ranker fails regardless of its input.

Error message The following code is used to trigger the output, although any exemplar invocation will do the trick:

from haystack.components.rankers import TransformersSimilarityRanker
ranker = TransformersSimilarityRanker(model="sentence-transformers/all-MiniLM-L6-v2")
ranker.warm_up()

# retriever_output["documents"] contains a list of Document types
# question is a string with the query question

ranker.run(query=question, documents=retriever_output["documents"])

When executed:

Traceback (most recent call last):
  File "xxxxxx", line 130, in <module>
    ranked_output = ranker.run(query=question, documents=retriever_output["documents"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxx/lib/python3.12/site-packages/haystack/components/rankers/transformers_similarity.py", line 268, in run
    documents[i].score = similarity_scores[i]
                         ~~~~~~~~~~~~~~~~~^^^
TypeError: list indices must be integers or slices, not list

When examined, the i value in this code is assigned to a list, with sorted_indices, from which i is assigned, being a list of lists.

Expected behavior That i in the file would be a scalar.

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce

Run an invocation of the ranker with the following Pipfile:

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
haystack-ai = "*"
sentence-transformers = ">=2.2.0"
pypdf = "*"
mdit-plain = "*"
llama-cpp-python = "==0.2.56"
llama-cpp-haystack = "*"
accelerate = "*"

[dev-packages]

[requires]
python_version = "3.12"

FAQ Check

System:

nvenkat94 commented 3 months ago

Hi @kristapsdz-saic , I encountered the same issue, which seems to be caused by the model output structure. The model produces a 2D array as output. I attempted to use the Reranker model(refer below), and it worked for me without any errors. However, further debugging is required to resolve the issue completely.

ranker = TransformersSimilarityRanker(model="BAAI/bge-reranker-large")

sjrl commented 1 month ago

Hey @kristapsdz-saic , @nvenkat94 is correct, this component only supports models with a Cross-Encoder architecture (which is the same as SequenceClassification in HuggingFace terms). Typically, models with reranker or cross-encoder in their name use this architecture and will be supported by this component.

The model provided in your original example "sentence-transformers/all-MiniLM-L6-v2" is an embedding model (or Bi-Encoder), which is not supported by this component. There is a nice explanation of the difference between Bi-Encoders and Cross-Encoders from Sentence Transformers here.