deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
107 stars 109 forks source link

Show how to use elasticsearch integration with security enabled #661

Open taborzbislaw opened 6 months ago

taborzbislaw commented 6 months ago

I have installed elasticsearch locally and run it as a deamon according to https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html I have exported ELASTIC_PASSWORD and ES_HOME environtment variables

When I run document_store = ElasticsearchDocumentStore(hosts = "http://localhost:9200") I got an error

ConnectionError Traceback (most recent call last) Cell In[2], line 1 ----> 1 document_store = ElasticsearchDocumentStore(hosts = "http://localhost:9200")

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/haystack_integrations/document_stores/elasticsearch/document_store.py:104, in ElasticsearchDocumentStore.init(self, hosts, index, embedding_similarity_function, **kwargs) 101 self._kwargs = kwargs 103 # Check client connection, this will raise if not connected --> 104 self._client.info() 106 # configure mapping for the embedding field 107 mappings = { 108 "properties": { 109 "embedding": {"type": "dense_vector", "index": True, "similarity": embedding_similarity_function}, (...) 122 ], 123 }

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/elasticsearch/_sync/client/utils.py:446, in _rewrite_parameters..wrapper..wrapped(*args, *kwargs) 443 except KeyError: 444 pass --> 446 return api(args, **kwargs)

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/elasticsearch/_sync/client/init.py:2453, in Elasticsearch.info(self, error_trace, filter_path, human, pretty) 2451 query["pretty"] = pretty 2452 headers = {"accept": "application/json"} -> 2453 return self.perform_request( # type: ignore[return-value] 2454 "GET", 2455 path, 2456 params=query, 2457 headers=headers, 2458 endpoint_id="info", 2459 path_parts=path_parts, 2460 )

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py:271, in BaseClient.perform_request(self, method, path, params, headers, body, endpoint_id, path_parts) 255 def perform_request( 256 self, 257 method: str, (...) 264 path_parts: Optional[Mapping[str, Any]] = None, 265 ) -> ApiResponse[Any]: 266 with self._otel.span( 267 method, 268 endpoint_id=endpoint_id, 269 path_parts=path_parts or {}, 270 ) as otel_span: --> 271 response = self._perform_request( 272 method, 273 path, 274 params=params, 275 headers=headers, 276 body=body, 277 otel_span=otel_span, 278 ) 279 otel_span.set_elastic_cloud_metadata(response.meta.headers) 280 return response

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py:316, in BaseClient._perform_request(self, method, path, params, headers, body, otel_span) 313 else: 314 target = path --> 316 meta, resp_body = self.transport.perform_request( 317 method, 318 target, 319 headers=request_headers, 320 body=body, 321 request_timeout=self._request_timeout, 322 max_retries=self._max_retries, 323 retry_on_status=self._retry_on_status, 324 retry_on_timeout=self._retry_on_timeout, 325 client_meta=self._client_meta, 326 otel_span=otel_span, 327 ) 329 # HEAD with a 404 is returned as a normal response 330 # since this is used as an 'exists' functionality. 331 if not (method == "HEAD" and meta.status == 404) and ( 332 not 200 <= meta.status < 299 333 and ( (...) 337 ) 338 ):

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/elastic_transport/_transport.py:342, in Transport.perform_request(self, method, target, body, headers, max_retries, retry_on_status, retry_on_timeout, request_timeout, client_meta, otel_span) 340 try: 341 otel_span.set_node_metadata(node.host, node.port, node.base_url, target) --> 342 resp = node.perform_request( 343 method, 344 target, 345 body=request_body, 346 headers=request_headers, 347 request_timeout=request_timeout, 348 ) 349 _logger.info( 350 "%s %s%s [status:%s duration:%.3fs]" 351 % ( (...) 357 ) 358 ) 360 if method != "HEAD":

File ~/anaconda3/envs/NLP/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py:202, in Urllib3HttpNode.perform_request(self, method, target, body, headers, request_timeout) 194 err = ConnectionError(str(e), errors=(e,)) 195 self._log_request( 196 method=method, 197 target=target, (...) 200 exception=err, 201 ) --> 202 raise err from None 204 meta = ApiResponseMeta( 205 node=self.config, 206 duration=duration, (...) 209 headers=response_headers, 210 ) 211 self._log_request( 212 method=method, 213 target=target, (...) 217 response=data, 218 )

ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: ProtocolError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))))

Please, help in resolving the issue

taborzbislaw commented 6 months ago

When I start elasticsearch using

sudo docker run -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.13.2

examples from https://docs.haystack.deepset.ai/docs/elasticsearchbm25retriever work correctly

But when docker is started with security enabled:

sudo docker run -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms1024m -Xmx1024m" docker.elastic.co/elasticsearch/elasticsearch:8.13.2

I got en error described in the previous comment: ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: ProtocolError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))))

It would be nice if the examples show how to use elasticsearch integration with security enabled like at sentence-transformers page: https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/semantic-search/semantic_search_quora_elasticsearch.py

best