elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.68k stars 24.66k forks source link

Duplicate ILM Cluster State Updates when policy is deleted #89831

Open luyuncheng opened 2 years ago

luyuncheng commented 2 years ago

Elasticsearch Version

master

Installed Plugins

null

Java Version

bundled

OS Version

Unix

Problem Description

PR: #89832

In #78427 and #78246, in some cases it would duplicate submitStateUpdateTask. and using Set<IndexLifecycleClusterStateUpdateTask> to prevent duplicate submit.

But, on rare occasions, when there is a exception occured like IllegalStateException("unable to parse steps for policy [" + policy + "] as it doesn't exist"); throwed from PolicyStepsRegistry#parseStepsFromPhase and every time occured, the wrapper in IndexLifecycleRunner#markPolicyRetrievalError always create a new instance.

so when Set<IndexLifecycleClusterStateUpdateTask> verify task like SetStepInfoUpdateTask it would duplicate added into Set<IndexLifecycleClusterStateUpdateTask> because SetStepInfoUpdateTask#ExceptionWrapper as stepInfo do not have method to verify as equals

Steps to Reproduce

Stop ILM Delete some policy Start ILM

When get pending tasks shows many duplicated tasks with steps: ilm-set-step-info XL6gvNOnfP

and using arthas shows info z71joAmNy9

and with different stepInfo object

    @SetStepInfoUpdateTask[
        logger=@Logger[org.elasticsearch.xpack.ilm.SetStepInfoUpdateTask:INFO in 5bc2b487],
        index=@Index[[service_shop_stats_offline-20220327/_1kgBxAfTiiu9KSTQV55mg]],
        policy=@String[hot_120d_delete_policy],
        currentStepKey=@StepKey[{"phase":"hot","action":"complete","name":"complete"}],
        stepInfo=@ExceptionWrapper[org.elasticsearch.xpack.ilm.SetStepInfoUpdateTask$ExceptionWrapper@65fd64a6],
        listener=@ListenableFuture[org.elasticsearch.common.util.concurrent.ListenableFuture@1ae056ea],
        priority=@Priority[NORMAL],
    ],

    @SetStepInfoUpdateTask[
        logger=@Logger[org.elasticsearch.xpack.ilm.SetStepInfoUpdateTask:INFO in 5bc2b487],
        index=@Index[[service_shop_stats_offline-20220327/_1kgBxAfTiiu9KSTQV55mg]],
        policy=@String[hot_120d_delete_policy],
        currentStepKey=@StepKey[{"phase":"hot","action":"complete","name":"complete"}],
        stepInfo=@ExceptionWrapper[org.elasticsearch.xpack.ilm.SetStepInfoUpdateTask$ExceptionWrapper@606d0dbf],
        listener=@ListenableFuture[org.elasticsearch.common.util.concurrent.ListenableFuture@3145cde8],
        priority=@Priority[NORMAL],
    ],

Logs (if relevant)

No response

elasticsearchmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)