deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed #868

Open ShuhaoZhangTony opened 4 months ago

ShuhaoZhangTony commented 4 months ago

Describe the bug A clear and concise description of what the bug is.

I'm trying to use haystack's API to build a RAG pipeline. I'm using FAISSDocumentStore and EmbeddingRetriever.

Works like the following:

# Create the document store using the factory
document_store = create_document_store(store_type, **store_config)

documents = []
documents_dir = args.docs_path
for filename in os.listdir(documents_dir):
    file_path = os.path.join(documents_dir, filename)
    if os.path.isfile(file_path):
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            document = Document(content=content)
            documents.append(document)
document_store.write_documents(documents)

# Ensure the retriever is initialized before updating embeddings
retriever = RetrieverFactory.get_retriever(retriever_type=args.retriever_type,
                                           document_store=document_store,
                                           query_embedding_model=args.query_embedding_model,
                                           passage_embedding_model=args.passage_embedding_model
                                           )

# Update embeddings right after writing documents
if hasattr(document_store,
           'update_embeddings'):  # check ensures that this code block only executes if the document_store instance has the update_embeddings method.
    document_store.update_embeddings(retriever=retriever, batch_size=10)

Error message Error that was thrown (if available)

haystack/modeling/model/language_model.py", line 222, in _pool_tokens ignore_mask_3d[:, :, :] = ignore_mask_2d[:, :, np.newaxis]


IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

**Expected behavior**
A clear and concise description of what you expected to happen.

**Additional context**
Add any other context about the problem here, like type of downstream task, part of  etc.. 

**To Reproduce**
Steps to reproduce the behavior

**System:**
 - OS: Ubuntu 18.04
 - GPU/CPU: 
 - FARM version: