elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.65k stars 24.65k forks source link

Re-indexing security system index results in broken index state #85072

Open n1v0lg opened 2 years ago

n1v0lg commented 2 years ago

Elasticsearch Version

local cluster running master commit 22824e47bc894e7d06e27a6fa7b221488516cb35, 8.2.0-SNAPSHOT

Installed Plugins

No response

Java Version

17

OS Version

Darwin Kernel Version 21.3.0

Problem Description

In ES 8.x, re-indexing .security-tokens to a new index in the right namespace, e.g., .security-tokens-8, results in the creation of a new "duplicate" index and a subsequent error state when attempting to use the .security-tokens index, effectively breaking it. The underlying issue is that the .security-tokens alias consequently points to two indeces.

This likely extends to other system indeces as well, not just .security-tokens.

Instead of allowing the re-index to proceed, we should prevent the operation, similar to how we prevent creating a new index .security-tokens-* within the namespace if an index already exists.

Steps to Reproduce

Trigger .security-tokens index creation:

POST /_security/oauth2/token
{
  "grant_type": "client_credentials"
}

Re-index:

POST _reindex
{
  "source": {
    "index": ".security-tokens"
  },
  "dest": {
    "index": ".security-tokens-42"
  }
}

Attempt to use index again:

POST /_security/oauth2/token
{
  "grant_type": "client_credentials"
}

Results in 400 and error log:

java.lang.IllegalStateException: Alias [.security-tokens] points to more than one index: [[.security-tokens-42/g6y3TELtQBCpNPo2ILrSww], [.security-tokens-7/vooe5E9lRvKz3AuqZbza7g]]

Running GET /_cat/indices?expand_wildcards=all shows that there are now two indeces for the .security-tokens alias (.security-tokens-42 and .security-tokens-7)

elasticmachine commented 2 years ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

gwbrown commented 2 years ago

To add a bit of context to this, when we see a new index get created in the .security-tokens* pattern, we grab that descriptor and apply it to the newly-created index - including the alias from the descriptor, if it has one. That alias now points at two indices, so writes to the alias fail.

This only happens if users are messing around in system-index-reserved namespaces, but it's still surprising that this can happen and possibly we should fail earlier, or handle the case of a second index being created for a descriptor that has an alias configured differently.

williamrandolph commented 2 years ago

In the original implementation, I think we could only autocreate an index with the primary index descriptor, and any attempt to autocreate a different index matching the system index descriptor would write to the primary index. https://github.com/elastic/elasticsearch/pull/77045

I'm going to see if I can isolate this behavior in the autocreate logic, for simplicity. Since we want to be able to reindex system indices for migration, I don't think we should catch this in the reindex code.

williamrandolph commented 2 years ago

I've reproduced this with the logic to auto-create system indices:

  1. Create a system index descriptor with a defined alias.
  2. Autocreate an index matching the system index descriptor pattern.
  3. Autocreate a different index matching the system index descriptor pattern.
  4. Write to the system alias. This will fail with an IllegalArgumentException: no write index is defined...
williamrandolph commented 2 years ago

We need to make sure we propagate changes to existing indices and aliases too. Right now the logic for doing that would go in the SystemIndexMetatdataUpgradeService class.