deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
15.21k stars 1.76k forks source link

Cannot connect to aws opensearch serverless #7989

Open adhikari23 opened 1 month ago

adhikari23 commented 1 month ago

Describe the bug Cannot connect to aws opensearch serverles. Here is the code snippet.

from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from haystack_integrations.components.embedders.amazon_bedrock import AmazonBedrockTextEmbedder
from requests_aws4auth import AWS4Auth
from opensearchpy import OpenSearch, RequestsHttpConnection
from boto3 import Session
from haystack import Pipeline

service = 'aoss'
credentials = Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   "us-east-1", service, session_token=credentials.token)

embedder = AmazonBedrockTextEmbedder(model="amazon.titan-embed-text-v1")
docstore = OpenSearchDocumentStore(
    hosts = <opensearch serverless endpoint>,
    index = "jps-test-index",
    http_auth= awsauth,
    timeout=300,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    engine = "faiss"
    # return_embedding=True

)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", embedder)
query_pipeline.add_component("retriever", OpenSearchEmbeddingRetriever(document_store=docstore))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result['retriever']['documents'][0])

Error message

  File "/root/bosai/genai/eric-bosaiapps-genai-poc/haystack-demo/venv/lib/python3.10/site-packages/opensearchpy/transport.py", line 416, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/root/bosai/genai/eric-bosaiapps-genai-poc/haystack-demo/venv/lib/python3.10/site-packages/opensearchpy/connection/http_requests.py", line 241, in perform_request
    self._raise_error(
  File "/root/bosai/genai/eric-bosaiapps-genai-poc/haystack-demo/venv/lib/python3.10/site-packages/opensearchpy/connection/base.py", line 315, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
opensearchpy.exceptions.NotFoundError: NotFoundError(404, '')

Expected behavior Retrieved documents

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce Steps to reproduce the behavior

FAQ Check

System:

davidsbatista commented 1 month ago

Hi @adhikari23 , I formatted your code posting to allow for a better reading.

First thing is that it seems that you are missing an import:

from haystack_integrations.components.retrievers.opensearch import OpenSearchEmbeddingRetrieve

Looking at your error message it seems that there's a problem connection to your OpenSearch server. Can you connect to it in isolation, i.e.: outside of haystack?

adhikari23 commented 1 month ago

Yes, I am able to connect to the opensearch server outside haystack.