elastic / elasticsearch-py

Official Python client for Elasticsearch
https://ela.st/es-python
Apache License 2.0
27 stars 1.18k forks source link

EnrichClient times out when using execute_policy #2622

Closed davehouser1 closed 1 month ago

davehouser1 commented 3 months ago

I am trying to use EnrichClient to send execute_policy. I am seeing urlib3 timeout errors when trying to do this for some of my policies.

Here is my code base

def execute_enrichment(config):
    try:
        es = Elasticsearch([config.logjam_endpoint],
                           api_key=config.logjam_key )
                           # request_timeout=config.logjam_request_timeout,
                           # connections_per_node=50)
    except Exception as e:
        logger.error(f"Failed to execute enrichment to {config.logjam_endpoint}")
        raise ConnectionError(f"Failed to execute enrichment: {str(e)}")
    logger.debug(f"Connection successful.")

    # Perform enrichment
    logger.debug(f"Performing enrichment execution on the following policies {config.enrich_policy_names}.")  # noqa: 501
    responses = []
    for policy in config.enrich_policy_names:
        if policy == "enrichment-test":
            logger.info(f"[!!!] Sending enrich for {policy}")
            response = es.enrich.execute_policy(name=policy,
                                                wait_for_completion=True )
            logger.info(response)
        else:
            logger.info(f"skipped {policy}")

Here is what I am seeing on the output

2024-08-01T15:56:00+0000 [+] [logger] - INFO - main.py:44 - skipped test-2
2024-08-01T15:56:00+0000 [+] [logger] - INFO - main.py:44 - skipped test-3
2024-08-01T15:56:00+0000 [+] [logger] - INFO - main.py:33 - [!!!] Sending enrich for enrichment-test
Traceback (most recent call last):
  File "/scripts/main.py", line 76, in <module>
    execute_enrichment(config)
  File "/scripts/main.py", line 35, in execute_enrichment
    response = es.enrich.execute_policy(name=policy,
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_sync/client/utils.py", line 446, in wrapped
    return api(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_sync/client/enrich.py", line 104, in execute_policy
    return self.perform_request(  # type: ignore[return-value]
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 423, in perform_request
    return self._client.perform_request(
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 271, in perform_request
    response = self._perform_request(
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 316, in _perform_request
    meta, resp_body = self.transport.perform_request(
  File "/usr/local/lib/python3.10/site-packages/elastic_transport/_transport.py", line 342, in perform_request
    resp = node.perform_request(
  File "/usr/local/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request
    raise err from None
elastic_transport.ConnectionTimeout: Connection timed out

I tried setting the request_timeout in Elasticsearch() instance, but that causes a different problem where I see nothing but the following when sending requests

 elastic_transport.ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: FullPoolError(HTTPConnectionPool(host='our.elastic.server', port=9200): Pool reached maximum size and no more connections are allowed.))

I tried reading the documentation for the EnrichClient and Elasticsearch classes, however its not clear to me what a lot of these parameters really do. I tried reading the source code but I didn't see many comments for what the parameters actually do. Am I doing something wrong here? Why is this failing?

I forgot to mention. I do not see this behavior when using curl, requests library, or the dev console. The response is an async task number, which I can check status on.

pquentin commented 3 months ago

Hey @davehouser1, sorry that this is giving you trouble.

I tried reading the documentation for the EnrichClient and Elasticsearch classes, however its not clear to me what a lot of these parameters really do. I tried reading the source code but I didn't see many comments for what the parameters actually do.

The parameters are documented in the Elasticsearch docs, which is linked from the client docs. For execute_policy, the link is https://www.elastic.co/guide/en/elasticsearch/reference/8.14/execute-enrich-policy-api.html, which should explain what the parameters do.

I forgot to mention. I do not see this behavior when using curl, requests library, or the dev console. The response is an async task number, which I can check status on.

I believe the difference is that in Python you're setting wait_for_completion to True, which will wait for the policy to execute, and can easily timeout. Can you please try setting wait_for_completion to False instead?

I tried setting the request_timeout in Elasticsearch() instance, but that causes a different problem where I see nothing but the following when sending requests

I think this is because the timeout was not high enough, meaning that the connections were discarded for not working? I'm not sure here, honestly, and would welcome a script that reproduces this, as we're considering disabling timeouts in future versions of the client.

davehouser1 commented 3 months ago

Thanks for the response @pquentin.

The parameters are documented in the Elasticsearch docs, which is linked from the client docs. For execute_policy, the link is https://www.elastic.co/guide/en/elasticsearch/reference/8.14/execute-enrich-policy-api.html, which should explain what the parameters do.

I checked out the link. The link only details one parameter wait_for_completion. It does not detail what all the other parameters are (error_trace, filter_path, human, pretty). Also what would the link be for the details on all the parameters for Elasticsearch Class?

I believe the difference is that in Python you're setting wait_for_completion to True, which will wait for the policy to execute, and can easily timeout. Can you please try setting wait_for_completion to False instead?

I set wait_for_completion=False, and disabled request_timeout in the Elasticsearch instance. This did the trick. Now I get a task ID. Very good. So it seems that the session was timing out because the client was waiting for completion. But setting it to not wait, it goes into an async mode and I have to check status on a ticket. You are good to close this issue. However if you could please read below before doing so.

Kind of unrelated problem, see this post https://github.com/elastic/elasticsearch/issues/70554.

There does not seem to be a good way to check status of a enrichment task using asyncio as elastic does not show status of a task after enrichment is complete. So async will always return a 404 after waiting for status.

Do you know a way around this using the AsyncElasticsearch?

I found a way to use requests and gather the .enrichment index value, then check if the _count has increased. This seems to be the only way I can find a work around. Thoughts?

pquentin commented 3 months ago

I checked out the link. The link only details one parameter wait_for_completion. It does not detail what all the other parameters are (error_trace, filter_path, human, pretty). Also what would the link be for the details on all the parameters for Elasticsearch Class?

The other parameters work for every single API, so they're not documented for every page. They're documented there: https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html.

Regarding the Elasticsearch class, most parameters are defined in https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html and https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/config.html. Yes, having the docs in two places is annoying, and a short description should be added in the reference docs too. Is there anything specific you're missing?

Kind of unrelated problem, see this post elastic/elasticsearch#70554.

There does not seem to be a good way to check status of a enrichment task using asyncio as elastic does not show status of a task after enrichment is complete. So async will always return a 404 after waiting for status.

Do you know a way around this using the AsyncElasticsearch?

I found a way to use requests and gather the .enrichment index value, then check if the _count has increased. This seems to be the only way I can find a work around. Thoughts?

Correct me if I'm wrong, but there seems to be some confusion between AsyncElasticsearch (which is a way to use Python's asyncio module with the Python client) and async APIs in Elasticsearch (which only relates to the Elasticsearch server, independently of the client). Calling async Elasticsearch APIs can be done with both Elasticsearch and AsyncElasticsearch.

Can you please show me your requests code? I can help you translating to the equivalent using the client, be it AsyncElasticsearch or Elasticsearch.

pquentin commented 1 month ago

Closing I haven’t heard back from you. I will reopen if there are additional questions. Thank you!