elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.74k stars 24.68k forks source link

Improve .security index migration resiliency #110532

Open albertzaharovits opened 3 months ago

albertzaharovits commented 3 months ago

Today, we don't handle the case where the Security migration's update-by-query is reporting partial success, for example in cases where it returns update conflicts because roles are continuously created/updated while the migration is taking place https://github.com/elastic/elasticsearch/blob/6abef3a2f0d4acf8df315d6676402cd4fb6a7238/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityMigrations.java#L107

My thinking is that the persistent migration task that runs the update-by-query should retry it on some time schedule, until all docs have been successfully updated. We can't simply rely on failing the migration and expect it to be retriggered, because currently retriggering is dependent on cluster state events affecting the index (of which there isn't an unlimited constant stream https://github.com/elastic/elasticsearch/blob/6abef3a2f0d4acf8df315d6676402cd4fb6a7238/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityIndexManager.java#L327

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-security (Team:Security)