deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.56k stars 1.82k forks source link

Vector Similarity for AWS Elasticsearch #665

Closed aayush-gupta15 closed 3 years ago

aayush-gupta15 commented 3 years ago

When I connect elasticsearch document store with aws elasticsearch index it show the error.

RequestError(400, 'mapper_parsing_exception', 'No handler for type [dense_vector] declared on field [embedding]')

I have checked my connection its connection but its giving me error when i am trying to start a index.

command i used: import certifi from haystack.document_store.elasticsearch import ElasticsearchDocumentStore document_store = ElasticsearchDocumentStore(scheme='https', ca_certs=certifi.where(), host="xxxxxxx.xx-xxxx-x.x.amazonaws.com", port="443", username="user", password="Pass", index="test")

tholor commented 3 years ago

Hey @aayush-gupta15,

Thanks for raising this issue. Am I assuming correctly that you are using the fully managed Elasticsearch Service from AWS (https://aws.amazon.com/de/elasticsearch-service/)? In that case, there might be some compatibility issues since the open distro of ES is used there (https://opendistro.github.io/for-elasticsearch/). Happy to dig deeper and fix it ...

aayush-gupta15 commented 3 years ago

Thanks for the quick reply @tholor. Yes I am using the fully managed Elasticsearch service from AWS. Is there any solution for this?

tanaysoni commented 3 years ago

Hi @aayush-gupta15, what version of Elasticsearch are you using?

aayush-gupta15 commented 3 years ago

Hi @tanaysoni

here my cluster details:

"version" : { "number" : "7.9.1", "build_flavor" : "oss", "build_type" : "tar", "build_hash" : "unknown", "build_date" : "2020-11-03T09:54:32.349659Z", "build_snapshot" : false, "lucene_version" : "8.6.2", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" },

tanaysoni commented 3 years ago

Hi @aayush-gupta15, thank you for the info. AWS Elasticsearch does not support the dense_vector field type but provides vector similarity with the KNN plugin. We plan to add support for the Open Distro Elasticsearch in the next days. I'll update on this issue once we have an implementation.

aayush-gupta15 commented 3 years ago

Hi @tanaysoni, As I was using elastic search for retrieval. So dense vector was not required for me. So I set the variable embedding_field=None in initializing document store as I checked the source code that embedding field is adding to the mapping after. and it worked now retrieval and writing document works fine. Thanks for help

tanmaylaud commented 3 years ago

@tholor @tanaysoni Had faced this issue a few weeks ago. Tried all the versions of ES available in AWS but in vain. Finally , had to spin up an ES via EC2, which worked fine.

tanaysoni commented 3 years ago

Hi @aayush-gupta15 @tanmaylaud, with #673, there's now a new OpenDistroElasticsearchDocumentStore that should be compatible with AWS Elasticsearch Service. It has the same method signatures as the ElasticsearchDocumentStore, so in theory, only the document store class needs to be updated in the code.

Let me know if it works for your use case.