camunda / zeebe

Distributed Workflow Engine for Microservices Orchestration
https://zeebe.io
3.05k stars 549 forks source link

Elasticsearch/Opensearch Exporter creates indices with inconsistent number of shards #18181

Open falko opened 2 weeks ago

falko commented 2 weeks ago

Describe the bug

The Elasticsearch Exporter creates some of its indexes with three shards and many with only one shard. This leads to uneven CPU utilization on multi-node ES clusters.

To Reproduce

Install Zeebe & Elasticsearch with the Helm chart.

Expected behavior

All indexes have the same number of shards, and the default value is correctly documented in the configuration file template: https://github.com/camunda/zeebe/blob/c4b3a8745718bfd58959db6c684e0060a4baa455/dist/src/main/config/broker.yaml.template#L665

I'm unsure about the default value:

Getting this wrong in the beginning may cause difficult migrations later. Therefore, this should be much better documented and/or the Helm chart should proactively set the environment variable ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_INDEX_NUMBEROFSHARDS to match the number of Elasticsearch replicas.

Log/Stacktrace

However, Elasticsearch shows some indexes are created with 3 shards and many with only 1 shard as shown by column pri in this table:

Full Stacktrace

```sh $ curl --location ‘http://127.0.0.1:9200/_cat/indices/zeebe*?v=true&s=index&pretty’ health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open zeebe-record_command-distribution_8.4.6_2024-05-02 gXkrrBVXQyu6KBd2eGw6OQ 1 0 504 0 128.7kb 128.7kb green open zeebe-record_deployment_8.4.6_2024-05-02 c8JaLs4xQoWX0AwnY06o4w 1 0 252 15 93.7kb 93.7kb green open zeebe-record_incident_8.4.6_2024-05-02 qNMm-0THStGynaXAsuSMvg 1 0 188129 0 33.4mb 33.4mb green open zeebe-record_job_8.4.6_2024-05-02 kOAHq69ATGC_bhjLkVt9jA 3 0 1064676 0 178.2mb 178.2mb green open zeebe-record_message-start-event-subscription_8.4.6_2024-05-02 0t2FxezQSOODZfk5BhkTfw 1 0 123367 1491 21mb 21mb green open zeebe-record_message_8.4.6_2024-05-02 6NRoNIDTSTqL-IuMges4cg 1 0 125918 0 18.8mb 18.8mb green open zeebe-record_process-instance_8.4.6_2024-05-02 8IkFn6y6R6GyWUpktsJYvQ 3 0 9628052 268819 1.5gb 1.5gb green open zeebe-record_process_8.4.6_2024-05-02 fmw8mu7CQGOqCPsvk4Cxkg 1 0 252 29 989.4kb 989.4kb green open zeebe-record_variable_8.4.6_2024-05-02 Pw86QbpQRmm94ljLq7QZEA 1 0 7753027 116033 1gb 1gb ```

Environment:

1561 has set number_of_shards in several index templates to 1, while others may have already had the value 3.

Setting it explicitly through the undocumented environment variable ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_ARGS_INDEX_NUMBEROFSHARDS does work:

health status index                                                          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   zeebe-record_command-distribution_8.4.6_2024-05-02             -qfmOofCTourrMDZjLFpmw   3   0        648            0    146.4kb        146.4kb
green  open   zeebe-record_deployment_8.4.6_2024-05-02                       6DrQDw6LQqWFYw1xJRjt0g   3   0        319            0    366.4kb        366.4kb
green  open   zeebe-record_incident_8.4.6_2024-05-02                         ABF2piFuTtumHQEjlMD8Cw   3   0        496            0    707.1kb        707.1kb
green  open   zeebe-record_job_8.4.6_2024-05-02                              IroIBBZpRLunJyX3IeE_ww   3   0      78951            0     23.2mb         23.2mb
green  open   zeebe-record_message-start-event-subscription_8.4.6_2024-05-02 rt0d40zNTfyekrriQj6tkQ   3   0       1933            0        1mb            1mb
green  open   zeebe-record_message_8.4.6_2024-05-02                          5Nvi8yRUQ16Bqm7y618jTA   3   0       5994            0      1.5mb          1.5mb
green  open   zeebe-record_process-instance_8.4.6_2024-05-02                 Ifda3NJQSAWal2fRoMP1Og   3   0     684610            0    130.8mb        130.8mb
green  open   zeebe-record_process_8.4.6_2024-05-02                          VkC-ZM-xSHS5-nQ_Ma2Y4g   3   0        288            0      1.2mb          1.2mb
green  open   zeebe-record_variable_8.4.6_2024-05-02                         Z_6upQgwTJiwETHMOihAFw   3   0     555300            0     96.1mb         96.1mb
falko commented 2 weeks ago

FYI @jothikiruthika

falko commented 2 weeks ago

The Opensearch Exporter is likely affected by the same issue as its index templates also have inconsistent numbers of shards.

megglos commented 1 week ago

ZPA-Triage: