TransformersSimilarityRanker (transformers_similarity.py) runtime error

kristapsdz-saic commented 3 months ago

Describe the bug The ranker fails regardless of its input.

Error message The following code is used to trigger the output, although any exemplar invocation will do the trick:

from haystack.components.rankers import TransformersSimilarityRanker
ranker = TransformersSimilarityRanker(model="sentence-transformers/all-MiniLM-L6-v2")
ranker.warm_up()

# retriever_output["documents"] contains a list of Document types
# question is a string with the query question

ranker.run(query=question, documents=retriever_output["documents"])

When executed:

Traceback (most recent call last):
  File "xxxxxx", line 130, in <module>
    ranked_output = ranker.run(query=question, documents=retriever_output["documents"])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxx/lib/python3.12/site-packages/haystack/components/rankers/transformers_similarity.py", line 268, in run
    documents[i].score = similarity_scores[i]
                         ~~~~~~~~~~~~~~~~~^^^
TypeError: list indices must be integers or slices, not list

When examined, the i value in this code is assigned to a list, with sorted_indices, from which i is assigned, being a list of lists.

Expected behavior That i in the file would be a scalar.

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce

Run an invocation of the ranker with the following Pipfile:

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
haystack-ai = "*"
sentence-transformers = ">=2.2.0"
pypdf = "*"
mdit-plain = "*"
llama-cpp-python = "==0.2.56"
llama-cpp-haystack = "*"
accelerate = "*"

[dev-packages]

[requires]
python_version = "3.12"

FAQ Check

[x] Have you had a look at our new FAQ page?

System:

OS: Mac OS X Ventura 13.6.1
GPU/CPU: CPU
Haystack version (commit or version number): current
DocumentStore: InMemoryDocumentStore (but does not matter)
Reader: none
Retriever: InMemoryEmbeddingRetriever (but does not matter)

nvenkat94 commented 3 months ago

Hi @kristapsdz-saic , I encountered the same issue, which seems to be caused by the model output structure. The model produces a 2D array as output. I attempted to use the Reranker model(refer below), and it worked for me without any errors. However, further debugging is required to resolve the issue completely.

ranker = TransformersSimilarityRanker(model="BAAI/bge-reranker-large")

sjrl commented 1 month ago

Hey @kristapsdz-saic , @nvenkat94 is correct, this component only supports models with a Cross-Encoder architecture (which is the same as SequenceClassification in HuggingFace terms). Typically, models with reranker or cross-encoder in their name use this architecture and will be supported by this component.

The model provided in your original example "sentence-transformers/all-MiniLM-L6-v2" is an embedding model (or Bi-Encoder), which is not supported by this component. There is a nice explanation of the difference between Bi-Encoders and Cross-Encoders from Sentence Transformers here.

deepset-ai / haystack

TransformersSimilarityRanker (transformers_similarity.py) runtime error #7521