elastic / curator

Curator: Tending your Elasticsearch indices
Other
3.04k stars 635 forks source link

Unexpected 'NoneType' error when running Curator #1713

Closed mmolinac closed 1 month ago

mmolinac commented 2 months ago

Error received: argument should be a bytes-like object or ASCII string, not 'NoneType'

We are running it successfully with version 5.8.4 in another cluster for years now. However, when running Curator 8 with an updated configuration, pointing to a new ES 8.x cluster, we can't make it start to work, as per the reported error.

Expected Behavior

With current configuration, we should be able to run it without errors and see the logs

Actual Behavior

When we run the CronJob object in Kubernetes, the following log is reporte:

2024-06-13 05:00:02,911 INFO      Preparing Action ID: 0, "delete_indices"
2024-06-13 05:00:02,911 INFO      Creating client object and testing connection
argument should be a bytes-like object or ASCII string, not 'NoneType'

Wrapper for running curator from source.

When used with Python 3 Curator requires the locale to be unicode. Any unicode
definitions are acceptable.

To set the locale to be unicode, try:

$ export LC_ALL=en_US.utf8
$ curator [ARGS]

Alternately, you should be able to specify the locale on the command-line:

$ LC_ALL=en_US.utf8 curator [ARGS]

Be sure to substitute your unicode variant for en_US.utf8

Steps to Reproduce the Problem

  1. The current CronJob description is as follows:
    apiVersion: batch/v1
    kind: CronJob
    metadata:
    creationTimestamp: "2024-06-06T18:35:03Z"
    generation: 2
    name: curator
    namespace: elastic-system
    resourceVersion: "426338733"
    uid: bf0cee67-1c06-4bd8-8342-01c3a33c9b9f
    spec:
    concurrencyPolicy: Forbid
    failedJobsHistoryLimit: 2
    jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
          - args:
            - --config
            - /.curator/config.yml
            - --no-verify_certs
            - /.curator/actionfile.yml
            command:
            - /curator/curator
            env:
            - name: ES_HOSTS
              value: https://xxxxxxxxxxxxx-es-http.elastic-system.svc:9200
            - name: ES_USER
              value: elastic
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: elastic
                  name: xxxxxxxxxxxxx-es-elastic-user
            image: untergeek/curator:8.0.15
            imagePullPolicy: IfNotPresent
            name: curator
            resources:
              limits:
                cpu: 200m
                memory: 512Mi
              requests:
                cpu: 200m
                memory: 512Mi
            securityContext: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /.curator
              name: commonconfig-volume
              readOnly: true
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          volumes:
          - name: commonconfig-volume
            projected:
              defaultMode: 420
              sources:
              - configMap:
                  items:
                  - key: config.yml
                    path: config.yml
                  - key: actionfile.yml
                    path: actionfile.yml
                  name: curator
    schedule: 0 5 * * *
    successfulJobsHistoryLimit: 5
    suspend: false
    status:
    lastScheduleTime: "2024-06-13T05:00:00Z"
    lastSuccessfulTime: "2024-06-13T05:00:06Z"

The mentioned configMap specification is:

apiVersion: v1
data:
  actionfile.yml: |
    ---
    actions:

      0:
        action: delete_indices
        description: >-

          Delete indices older than 100 days (based on index name), for all

          indices. Ignore the error if the filter does not result in an
          actionable list of indices (ignore_empty_list) and exit cleanly.
        options:
          ignore_empty_list: True
          timeout_override: 300
          continue_if_exception: False
          disable_action: False
        filters:
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y.%m.%d'
          unit: days
          unit_count: 100

      1:
        action: delete_indices
        description: >-

          Delete indices older than 10 days (based on index name), for dev-clickhouselog

          indices. Ignore the error if the filter does not result in an
          actionable list of indices (ignore_empty_list) and exit cleanly.
        options:
          ignore_empty_list: True
          timeout_override: 300
          continue_if_exception: False
          disable_action: False
        filters:
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y.%m.%d'
          unit: days
          unit_count: 10

        - filtertype: pattern
          kind: prefix
          value: dev-clickhouselog
  config.yml: |
    ---
    # Remember, leave a key empty if there is no value.  None will be a string,
    # not a Python "NoneType"
    elasticsearch:
      client:
        hosts:
          - ${ES_HOSTS}
        cloud_id:
        ca_certs:
        client_cert:
        client_key:
        verify_certs: False
        request_timeout: 30
      other_settings:
        master_only: False
        username: ${ES_USER}
        password: ${ES_PASSWORD}
        api_key:
          id:
          api_key:
          token:
    logging:
      loglevel: INFO
      logfile:
      logformat: default
      blacklist: ['elastic_transport', 'urllib3']
immutable: false
kind: ConfigMap
metadata:
  creationTimestamp: "2024-06-06T18:34:59Z"
  name: curator
  namespace: elastic-system
  resourceVersion: "418110754"
  uid: a2221ec8-826d-44e5-927a-6d31a1e9f2d5

Specifications

Context (Environment)

As we can't use ILM right now, we're relying on Curator for keeping our storage constraints. It's been working great so far in our previous cluster and versions.

We adapted our current actions and flows to the new format (version 8).

Current CronJob, Elastic cluster and other elements are being managed by Terraform and Helm Charts (ECK)

Detailed Description

A better description of the exception would be great. We don't get a lot of hints of what could have been possibly going on.

untergeek commented 2 months ago

Acknowledging your issue

I'm sorry to hear you've encountered a blocking issue with Curator.

In reading over the configuration, I suspect that because you included an empty key (logfile:), that might be what's tripping the error you encountered. That would be a bug, so I'll investigate and confirm my hypothesis. While I try to replicate this scenario to figure out exactly where things went wrong (or if it's something else), let's see if we can't find a work-around that might actually be better for you anyway.

Potential Workaround

It appears that there are issues building the client object. There have been some improvements since Curator 5.8, namely a separate client builder library called es_client. I have not yet added the documentation to Curator showing this feature, so you are finding out about it through a back-channel.

ENV vars revisited

If you read the es_client documentation regarding ENV vars, you'll find that Curator should be able to use these variables without having to even specify a command-line flag. The singleton interface help output actually shows what these variables are for Curator (in addition to what appears in the es_client documentation). In the interest of brevity, I will include only the lines that match your supplied configuration:

Subset of supported ENV vars

  --config PATH                   Path to configuration file.  [env var: ESCLIENT_CONFIG]
  --hosts TEXT                    Elasticsearch URL to connect to.  [env var: ESCLIENT_HOSTS]
  --username TEXT                 Elasticsearch username  [env var: ESCLIENT_USERNAME]
  --password TEXT                 Elasticsearch password  [env var: ESCLIENT_PASSWORD]
  --verify_certs / --no-verify_certs
                                  Verify SSL/TLS certificate(s)  [env var: ESCLIENT_VERIFY_CERTS]
  --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
                                  Log level  [env var: ESCLIENT_LOGLEVEL]
  --logfile TEXT                  Log file  [env var: ESCLIENT_LOGFILE]
  --logformat [default|json|ecs]  Log output format  [env var: ESCLIENT_LOGFORMAT]
  --blacklist TEXT                Named entities will not be logged  [env var: ESCLIENT_BLACKLIST]

Proposed configuration revision

I note that your client config file only has a few options in it, ergo, you can safely omit using a config file altogether and just use ENV vars as follows:

        spec:
          containers:
          - args:
            - /.curator/actionfile.yml
            command:
            - /curator/curator
            env:
            - name: ESCLIENT_HOSTS
              value: https://xxxxxxxxxxxxx-es-http.elastic-system.svc:9200
            - name: ESCLIENT_USERNAME
              value: elastic
            - name: ESCLIENT_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: elastic
                  name: xxxxxxxxxxxxx-es-elastic-user
            - name: ESCLIENT_VERIFY_CERTS
              value: false
            - name: ESCLIENT_LOGLEVEL
              value: INFO
            - name: ESCLIENT_LOGFORMAT
              value: default
            - name: ESCLIENT_BLACKLIST
              value: "elastic_transport urllib3"
            image: untergeek/curator:8.0.15

Breakdown and Explanation

I included the logging values for instructional purposes, which is unnecessary as what you have specified are all default values. Each of ESCLIENT_LOGLEVEL, ESCLIENT_LOGFORMAT, and ESCLIENT_BLACKLIST can be safely omitted from the configuration example above. I have still included them in the following breakdown.

Action file.

With ENV vars set, the only arg necessary is the action file, which you have appropriately volume mapped to /.curator/

ESCLIENT_HOSTS

This is the hosts value in a config file or the --hosts flag at the command-line

ESCLIENT_USERNAME

This is the --username command-line flag, or elasticsearch.other_settings.username config file value.

ESCLIENT_PASSWORD

This is the --password command-line flag, or elasticsearch.other_settings.password config file value.

ESCLIENT_VERIFY_CERTS

This is the --verify_certs / --no-verify_certs command-line flag, or elasticsearch.client.verify_certs config file value.

This value must be set to a recognized boolean value as shown here which can be 1, T, or true for True boolean values, or 0, F, or false for False boolean values. These values are case insensitive.

ESCLIENT_LOGLEVEL

This is the --loglevel command-line flag, or logging.loglevel config file value.

This value must be set to one of DEBUG|INFO|WARNING|ERROR|CRITICAL and is case-sensitive. If unset, the default value is INFO

ESCLIENT_LOGFILE

This is the --logfile command-line flag, or logging.logfile config file value.

This should be either unset/unspecified, or set to a file in a volume-mapped path, e.g. /.curator/curator.log.

When unset, and running in a containerized or Docker environment, it should write in a way that docker logs CONTAINERNAME will show the output. Regardless, if unset it will log to STDOUT.

ESCLIENT_LOGFORMAT

This is the --logformat command-line flag, or logging.logformat config file value.

This value must be set to one of default|json|ecs and is case-sensitive. If unset, the default value is default.

ESCLIENT_BLACKLIST

This is the --blacklist command-line flag, or logging.blacklist config file value.

At the command-line, this flag can be specified multiple times to add multiple values. In the config file, this can be a list or array type.

This ENV var also supports multiple values by space separation. Simply encapsulate multiple values in double-quotes and separate with spaces, e.g. "elastic_transport urllib3"

Conclusion

Using these ENV vars in this way fully eliminates the need to provide config.yml as a ConfigMap. For a non-containerized execution, the resulting command-line might look something like the :

$ ESCLIENT_HOSTS="https://xxxxxxxxxxxxx-es-http.elastic-system.svc:9200" \
 ESCLIENT_USERNAME="elastic" \
 ESCLIENT_PASSWORD="REDACTED" \
 ESCLIENT_VERIFY_CERTS="false" \
 curator /.curator/action.yml

But with a containerized execution where the ENV vars are all pre-set, it would just be:

curator /.curator/action.yml

Hopefully this work-around gets you functional while I investigate the bug you've submitted.

mmolinac commented 2 months ago

Thank you very much! The workaround worked like a charm:

% kubectl -n elastic-system logs curator-28639350-gjfbj | grep -v InsecureRequestWarning
2024-06-14 10:30:03,111 INFO      Preparing Action ID: 0, "delete_indices"
2024-06-14 10:30:03,111 INFO      Creating client object and testing connection
/usr/local/lib/python3.11/site-packages/elastic_transport/_node/_http_urllib3.py:119: SecurityWarning: Connecting to 'https://xxxxxxxxxxxxx-es-http.elastic-system.svc:9200' using TLS with verify_certs=False is insecure
2024-06-14 10:30:03,343 INFO      Trying Action ID: 0, "delete_indices": 
Delete indices older than 100 days (based on index name), for all
indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly.
2024-06-14 10:30:03,343 CRITICAL  INITIAL Action kwargs: {}
2024-06-14 10:30:03,343 CRITICAL  Post search_pattern Action kwargs: {'master_timeout': 30}
2024-06-14 10:30:04,930 CRITICAL  Pre Instantiation Action kwargs: {'master_timeout': 30}
2024-06-14 10:30:04,931 INFO      Deleting 13 selected indices: ['.kibana-observability-ai-assistant-conversations-000001', 'metrics-endpoint.metadata_current_default', 'redactedlog-2023.05.18', 'redactedlog-2023.05.25', 'redactedlog-2023.03.14', 'redactedlog-2023.05.31', 'redactedlog-2023.03.15', 'redactedlog-2023.11.15', 'redactedlog-2023.11.21', 'redactedlog-2023.11.24', 'redactedlog-2024.01.16', '.kibana-observability-ai-assistant-kb-000001', 'redactedlog-2023.09.25']
2024-06-14 10:30:04,931 INFO      ---deleting index .kibana-observability-ai-assistant-conversations-000001
2024-06-14 10:30:04,931 INFO      ---deleting index metrics-endpoint.metadata_current_default
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.05.18
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.05.25
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.03.14
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.05.31
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.03.15
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.11.15
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.11.21
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.11.24
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2024.01.16
2024-06-14 10:30:04,931 INFO      ---deleting index .kibana-observability-ai-assistant-kb-000001
2024-06-14 10:30:04,931 INFO      ---deleting index redactedlog-2023.09.25
2024-06-14 10:30:06,582 INFO      Action ID: 0, "delete_indices" completed.
2024-06-14 10:30:06,582 INFO      Preparing Action ID: 1, "delete_indices"
2024-06-14 10:30:06,582 INFO      Creating client object and testing connection
2024-06-14 10:30:06,633 INFO      Trying Action ID: 1, "delete_indices": 
Delete indices older than 10 days (based on index name), for dev-redactedlog
indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly.
2024-06-14 10:30:06,633 CRITICAL  INITIAL Action kwargs: {}
2024-06-14 10:30:06,633 CRITICAL  Post search_pattern Action kwargs: {'master_timeout': 30}
2024-06-14 10:30:07,933 CRITICAL  Pre Instantiation Action kwargs: {'master_timeout': 30}
2024-06-14 10:30:07,933 INFO      Deleting 1 selected indices: ['dev-redactedlog-2024.05.24']
2024-06-14 10:30:07,933 INFO      ---deleting index dev-redactedlog-2024.05.24
2024-06-14 10:30:08,251 INFO      Action ID: 1, "delete_indices" completed.
2024-06-14 10:30:08,251 INFO      All actions completed.

I'm afraid I have to exclude some indices, like .kibana-observability-ai-assistant-kb-000001 or metrics-endpoint.metadata_current_default.

Let us know any news about the final fix, please!

untergeek commented 2 months ago

I'm glad it's working now!

I shouldn't have called it a workaround as this is fully supported and not some sort of sneaky way to get around using a client configuration file. It is only a workaround from the way you were using Curator. This containerized approach is fully supported, and the automatic use of environment variables is designed to make your life easier. You should be using environment variables rather than a client configuration file.

That said, I will still verify the bug (with your confirmation that makes me even more certain it's the empty logfile entry) and fix it, though I'm pretty sure that will be upstream in the es_client module.

untergeek commented 2 months ago

Finally found it: https://github.com/untergeek/es_client/issues/66

untergeek commented 1 month ago

This is now closed with the release of 8.0.16