elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.28k stars 24.86k forks source link

`EnrichPolicyMaintenanceService` can delete new enrichment index before it's promoted #87580

Open n0othing opened 2 years ago

n0othing commented 2 years ago

Elasticsearch Version

7.17.4

Installed Plugins

No response

Java Version

bundled

OS Version

RHEL 7

Problem Description

During a subsequent enrichment policy execution, EnrichPolicyMaintenanceService can flag the new enrichment index as stale while it's being indexed to. This can lead to the new index being created a second time (due to cause [auto(bulk api)]), which will also get deleted during the next enrich.cleanup_period.

Steps to Reproduce

In the below logs, a second execution of my_policy was executing during a maintenance cycle. .enrich-my_policy-1654635601016 was still receiving data so it hadn't been promoted, resulting in it getting deleted:

"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,232][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Checking if should remove enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,232][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Enrich index [.enrich-my_policy-1654635601016] is not marked as a live index since it has no alias information"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,233][INFO ][org.elasticsearch.cluster.metadata.MetadataDeleteIndexService] [instance-0000000002] [.enrich-my_policy-1654635601016/EcveiRTjT9WsK0h8E5xROg] deleting index"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,568][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Completed deletion of stale enrich indices [[.enrich-my_policy-1654635601016]]"

A second .enrich-my_policy-1654635601016 was created due to the on-going reindex process:

"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,570][INFO ][org.elasticsearch.cluster.metadata.MetadataCreateIndexService] [instance-0000000002] [.enrich-my_policy-1654635601016] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]"

Reindex continues and the index is promoted:

"Jun 7, 2022 @ 17:03:12.264","[2022-06-07T21:03:09,891][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Promoting new enrich index [.enrich-my_policy-1654635601016] to alias [.enrich-my_policy]"
"Jun 7, 2022 @ 17:03:12.264","[2022-06-07T21:03:09,973][INFO ][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Policy execution complete"

The second .enrich-my_policy-1654635601016 gets deleted because it it doesn't have a policy name in the index settings:

"Jun 7, 2022 @ 17:15:43.383","[2022-06-07T21:15:38,235][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Enrich index [.enrich-my_policy-1654635601016] does not correspond to any existing policy. Found policy name [null]"
"Jun 7, 2022 @ 17:15:43.384","[2022-06-07T21:15:38,236][INFO ][org.elasticsearch.cluster.metadata.MetadataDeleteIndexService] [instance-0000000002] [.enrich-my_policy-1654529338711/FkyJRz_lS6eiK0TzIvj24A] deleting index"

Logs (if relevant)

"Jun 7, 2022 @ 17:00:04.247","[2022-06-07T21:00:01,013][INFO ][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Running enrich policy"
"Jun 7, 2022 @ 17:00:04.247","[2022-06-07T21:00:01,013][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Checking source indices [[my_source_index]]"
"Jun 7, 2022 @ 17:00:04.247","[2022-06-07T21:00:01,015][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Validating [[my_source_index]] source mappings"
"Jun 7, 2022 @ 17:00:04.247","[2022-06-07T21:00:01,016][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Creating new enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:00:04.247","[2022-06-07T21:00:01,135][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Transferring source data to new enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:00:08.111","[2022-06-07T21:00:01,020][INFO ][org.elasticsearch.cluster.metadata.MetadataCreateIndexService] [instance-0000000002] [.enrich-my_policy-1654635601016] creating index, cause [api], templates [], shards [1]/[0]"
"Jun 7, 2022 @ 17:00:08.111","[2022-06-07T21:00:01,092][INFO ][org.elasticsearch.cluster.routing.allocation.AllocationService] [instance-0000000002] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.enrich-my_policy-1654635601016][0]]])."
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:42,323][INFO ][org.elasticsearch.cluster.routing.allocation.AllocationService] [instance-0000000002] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.enrich-my_policy-1654635601016][0]]])."
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,232][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Checking if should remove enrich index [.enrich-my_policy-1654529338711]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,232][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Enrich index [.enrich-my_policy-1654529338711] was spared since it is associated with the valid policy [my_policy] and references alias [.enrich-my_policy]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,232][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Checking if should remove enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,232][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Enrich index [.enrich-my_policy-1654635601016] is not marked as a live index since it has no alias information"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,233][INFO ][org.elasticsearch.cluster.metadata.MetadataDeleteIndexService] [instance-0000000002] [.enrich-my_policy-1654635601016/EcveiRTjT9WsK0h8E5xROg] deleting index"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,568][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Completed deletion of stale enrich indices [[.enrich-my_policy-1654635601016]]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:38,570][INFO ][org.elasticsearch.cluster.metadata.MetadataCreateIndexService] [instance-0000000002] [.enrich-my_policy-1654635601016] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:39,398][INFO ][org.elasticsearch.cluster.metadata.MetadataMappingService] [instance-0000000002] [.enrich-my_policy-1654635601016/stBKskt-RHex2kEUKe4wdg] create_mapping [_doc]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:39,743][INFO ][org.elasticsearch.cluster.metadata.MetadataMappingService] [instance-0000000002] [.enrich-my_policy-1654635601016/stBKskt-RHex2kEUKe4wdg] update_mapping [_doc]"
"Jun 7, 2022 @ 17:00:43.113","[2022-06-07T21:00:40,753][INFO ][org.elasticsearch.cluster.metadata.MetadataMappingService] [instance-0000000002] [.enrich-my_policy-1654635601016/stBKskt-RHex2kEUKe4wdg] update_mapping [_doc]"
"Jun 7, 2022 @ 17:02:22.261","[2022-06-07T21:02:17,214][INFO ][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Transferred [####] documents to enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:02:22.261","[2022-06-07T21:02:17,214][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Force merging newly created enrich index [.enrich-my_policy-1654635601016] (Attempt 1/3)"
"Jun 7, 2022 @ 17:02:47.262","[2022-06-07T21:02:39,808][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Refreshing enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:02:47.262","[2022-06-07T21:02:39,815][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Setting new enrich index [.enrich-my_policy-1654635601016] to be read only"
"Jun 7, 2022 @ 17:02:48.118","[2022-06-07T21:02:39,820][INFO ][org.elasticsearch.cluster.routing.allocation.AllocationService] [instance-0000000002] updating number_of_replicas to [n] for indices [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:03:12.264","[2022-06-07T21:03:09,891][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Promoting new enrich index [.enrich-my_policy-1654635601016] to alias [.enrich-my_policy]"
"Jun 7, 2022 @ 17:03:12.264","[2022-06-07T21:03:09,973][INFO ][org.elasticsearch.xpack.enrich.EnrichPolicyRunner] [instance-0000000012] Policy [my_policy]: Policy execution complete"
"Jun 7, 2022 @ 17:03:12.264","[2022-06-07T21:03:09,973][DEBUG][org.elasticsearch.xpack.enrich.action.InternalExecutePolicyAction] [instance-0000000012] successfully executed policy [my_policy]"
"Jun 7, 2022 @ 17:03:23.120","[2022-06-07T21:03:14,287][INFO ][org.elasticsearch.cluster.routing.allocation.AllocationService] [instance-0000000002] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.enrich-my_policy-1654635601016][0]]])."
"Jun 7, 2022 @ 17:15:43.383","[2022-06-07T21:15:38,235][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Checking if should remove enrich index [.enrich-my_policy-1654529338711]"
"Jun 7, 2022 @ 17:15:43.383","[2022-06-07T21:15:38,235][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Enrich index [.enrich-my_policy-1654529338711] is not marked as a live index since it has no alias information"
"Jun 7, 2022 @ 17:15:43.383","[2022-06-07T21:15:38,235][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Checking if should remove enrich index [.enrich-my_policy-1654635601016]"
"Jun 7, 2022 @ 17:15:43.383","[2022-06-07T21:15:38,235][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Enrich index [.enrich-my_policy-1654635601016] does not correspond to any existing policy. Found policy name [null]"
"Jun 7, 2022 @ 17:15:43.384","[2022-06-07T21:15:38,236][INFO ][org.elasticsearch.cluster.metadata.MetadataDeleteIndexService] [instance-0000000002] [.enrich-my_policy-1654529338711/FkyJRz_lS6eiK0TzIvj24A] deleting index"
"Jun 7, 2022 @ 17:15:43.384","[2022-06-07T21:15:38,236][INFO ][org.elasticsearch.cluster.metadata.MetadataDeleteIndexService] [instance-0000000002] [.enrich-my_policy-1654635601016/stBKskt-RHex2kEUKe4wdg] deleting index"
"Jun 7, 2022 @ 17:15:43.384","[2022-06-07T21:15:38,428][DEBUG][org.elasticsearch.xpack.enrich.EnrichPolicyMaintenanceService] [instance-0000000002] Completed deletion of stale enrich indices [[.enrich-my_policy-1654529338711, .enrich-my_policy-1654635601016]]
elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

martijnvg commented 2 years ago

The EnrichPolicyMaintenanceService shouldn't remove any .enrich-* index if any policy is currently being executed. It looks like here a policy is executing during a maintenance cycle. I think we should avoid this from happening in the same way the maintenance cycle doesn't start if enrich policy are currently executing. The enrich execute policy api should wait or fail if maintenance cycle is running.