elastic / curator

Curator: Tending your Elasticsearch indices
Other
3.04k stars 634 forks source link

Exit code behavior pip install vs. docker/k8s #1726

Open boernd opened 1 month ago

boernd commented 1 month ago

Curator version: 8.0.16

We let curator run as a cronjob within Kubernetes. If for instance the pod cannot contact Elasticsearch during client creation it throws error logs but the job gets status Completed and not Error.

I tested pip install vs k8s and get different error codes.

pip install:

> curator --config ./config.yml ./action_file.yml
2024-09-11 10:27:51,147 INFO      Preparing Action ID: 1, "delete_indices"
2024-09-11 10:27:51,147 INFO      Creating client object and testing connection
2024-09-11 10:27:51,211 CRITICAL  Unable to establish client connection to Elasticsearch!
2024-09-11 10:27:51,212 CRITICAL  Exception encountered: Connection error caused by: ConnectionError(Connection error caused by: NameResolutionError(<urllib3.connection.HTTPConnection object at 0x7f718f4241d0>: Failed to resolve 'xyz' ([Errno -2] Name or service not known)))
Traceback (most recent call last):
  File "/home/bernd/.local/bin/curator", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/curator/cli.py", line 299, in cli
    run(ctx)
  File "/home/bernd/.local/curator/lib/python3.11/site-packages/curator/cli.py", line 223, in run
    if ilm_action_skip(client, action_def):
                       ^^^^^^
UnboundLocalError: cannot access local variable 'client' where it is not associated with a value

> echo $?
1

Triggering the command within a k8s pod (official docker image):

> k exec -ti curator-elasticsearch-curator-onetime-klrml sh                                                     

/ $ /curator/curator --config /etc/es-curator/config.yml /etc/es-curator/action_file.yml
2024-09-11 08:35:27,652 INFO      Preparing Action ID: 1, "delete_indices"
2024-09-11 08:35:27,743 INFO      Creating client object and testing connection
2024-09-11 08:35:27,826 CRITICAL  Unable to establish client connection to Elasticsearch!
2024-09-11 08:35:27,826 CRITICAL  Exception encountered: Connection error caused by: ConnectionError(Connection error caused by: NameResolutionError(<urllib3.connection.HTTPConnection object at 0x7f55e814dad0>: Failed to resolve 'xyz' ([Errno -2] Name does not resolve)))

/ $ echo $?
0

The Dockerfile builds an executable, maybe there is some difference in behavior?

I also had a look at the code. If I read the code correctly I saw that the get_client def in the es_client lib raises an ESClientException but curator just catches a ClientException.

untergeek commented 1 month ago

Oh, this is fascinating. Thank you for raising this issue. I will definitely see if I can make the frozen binary exit with a 1 error code when it should.