Closed davidkyle closed 5 days ago
Pinging @elastic/es-core-infra (Team:Core/Infra)
We discussed this with @dan-rubinstein over Slack. I understand this bug is blocking https://github.com/elastic/elasticsearch/issues/111856 which is required to allow customers to configure a chunking strategy which would determine how large blocks of text are broken up before being passed to ML models - this is a feature expected to GA in 8.16.
The Core/Infra team has some reservations against the solution of https://github.com/elastic/elasticsearch/pull/112074 as it exposes system index internals that we would like to avoid. We agreed that @elastic/es-core-infra will pick up the bugfix and deliver it before end of September. If we see any risks with delivering the bugfix by end of September, we need to revisit the solution with the ML team.
Thanks for the detailed info @davidkyle and @dan-rubinstein If I understood it correctly, the bug is that we do not fail creation consistently when in a mixed cluster, and the fix would be to do that (fail every time) - so that the index is never auto-created with a mixed cluster. That way SystemIndexMappingUpdateService will update the mappings for the index only when all nodes in the cluster have been upgraded. Is that correct?
Another question: do you have a link to some build scans where you saw this failing?
If I understood it correctly, the bug is that we do not fail creation consistently when in a mixed cluster, and the fix would be to do that (fail every time) - so that the index is never auto-created with a mixed cluster.
Correct @ldematte, the expectation is that if auto create fails in a mixed cluster it should fail all the time not some of the time.
@arpadkiraly fixing this issue will not unblock https://github.com/elastic/elasticsearch/pull/112074. The problem there is that we cannot create new inference endpoints in a mixed cluster where the index mappings are not up to date. The inference endpoint configurations are stored in the system index which is why we have to write to it when creating a new inference.
We have 2 failure scenarios:
These are the problems we have to solve. In https://github.com/elastic/elasticsearch/pull/112074 the code detects the mixed cluster state and either explicitly creates the index or updates the mappings which addresses both scenarios
@arpadkiraly The bug we really need to fix for GA is the NPE thrown after upgrading from a version where the system index did not exist https://github.com/elastic/elasticsearch/issues/112694
After investigating a bit more, I'm not 100% sure this is a bug. Auto create is a TransportMasterNodeAction, so it will always happen on the master node. Suppose we have 2 nodes, like in some of the failing tests:
test-cluster-0: version=8.16.0 (index mapping v1)
test-cluster-1: version=9.0.0 (index mapping v2)
Now, things change depending on the node that is elected master. In both cases, clusterstate will hold correctly the minimum system index mapping as v1.
If test-cluster-0
gets elected as master, auto-create will try to create the system index with the version it knows (v1); we check against the min, we see it's OK, and create it.
If test-cluster-1
gets elected as master, auto-create will try to create the system index with the version it knows (v2); we check against the min, we see it's not OK, and bail out.
Or better, I'm not sure the bug is where we thought it is; seems like auto create should create the index with a different mapping version in a mixed scenario, but I'm not sure which are the implications. I will keep digging.
auto create should create the index with a different mapping version in a mixed scenario
Yes, it should. I had thought we handled that by keeping around the old mapping, but I'm not finding that. In order to create the mappings in a mixed cluster, the master node needs to be aware of not only the mappings from its code, but also the mappings associated with the minimum mappings version in the cluster.
I had thought we handled that by keeping around the old mapping
I found code that seems to imply this, but it seems to be in the wrong place. I'll keep digging.
So, turns out we are handling this by using the old mapping, but this mapping has to be specified in the system descriptor for us to be able to find it.
For the example given above, this means that we need to declare the system descriptor in this way in InferencePlugin
(or something similar):
@Override
public Collection<SystemIndexDescriptor> getSystemIndexDescriptors(Settings settings) {
var inferenceSecretsDescriptorV1 = SystemIndexDescriptor.builder()
// all the other .setXxx
// various stuff that might be identical or change
.setMappings(InferenceSecretsIndex.mappingsV1()) // this definitely change
// ....
.build();
return List.of(
...
SystemIndexDescriptor.builder()
.setType(SystemIndexDescriptor.Type.INTERNAL_MANAGED)
.setIndexPattern(InferenceSecretsIndex.INDEX_PATTERN)
.setPrimaryIndex(InferenceSecretsIndex.INDEX_NAME)
.setDescription("Contains inference service secrets")
.setMappings(InferenceSecretsIndex.mappings()) // The new (v2) mappings
.setSettings(InferenceSecretsIndex.settings())
.setVersionMetaKey("version")
.setOrigin(ClientHelper.INFERENCE_ORIGIN)
.setPriorSystemIndexDescriptors(List.of(inferenceSecretsDescriptorV1)) // <-- HERE
.setNetNew()
.build()
);
}
where mappingsV1
is something like
public static XContentBuilder mappingsV1() {
try {
return jsonBuilder().startObject()
.startObject(SINGLE_MAPPING_NAME)
.startObject("_meta")
.field(SystemIndexDescriptor.VERSION_META_KEY, 1) // notice the fixed "1" here
This way the 9.0.0 cluster will know about both v1 and v2, and will know how to create v1 when in a mixed cluster (choosing it based on the min cluster version). The logic to do that is already there.
I think we can close this as a non-bug; maybe we could improve the error message to point people in the right direction?
There is one small bug to fix: handle the case for null alias names in prior descriptors. I will have a PR for that. @rjernst let me know what you think about improving the error message. I was thinking about going from this:
[auto-create index] failed - system index [.secrets-inference] requires all data and master nodes to have mappings versions at least of version [MappingsVersion[version=2, hash=-1434574148]]
to this
[auto-create index] failed - requested creation of system index [.secrets-inference] with version [MappingsVersion[version=2, hash=-1434574148]], which is greater than the cluster minimum index mapping version. Ensure that the system index descriptor for [.secrets-inference] includes a prior definition for version [MappingsVersion[version=1, hash=-1434574148]]
Wdyt?
+1 to the improved error message
.setPriorSystemIndexDescriptors(List.of(inferenceSecretsDescriptorV1)) // <-- HERE
Thanks @ldematte, sorry for the confusion I'm really happen you found the problem.
Once the above is implemented we can safely assume that in a mixed cluster the mappings will be the previous version and auto create won't fail. This is good because we can remove the checks we added to prevent the auto create failure.
Once the above is implemented we can safely assume that in a mixed cluster the mappings will be the previous version and auto create won't fail. This is good because we can remove the checks we added to prevent the auto create failure.
Yes exactly 👍
I opened https://github.com/elastic/elasticsearch/pull/113274 adding some detail to the Javadoc on using prior index descriptors
@davidkyle I opened https://github.com/elastic/elasticsearch/pull/113278 to improve messaging and cover the case in which an alias is null in a prior descriptor (which was/is the case for inference service secrets "v1")
Elasticsearch Version
8.15
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
Auto creating an internal managed system index fails in a mixed cluster where the nodes have different views of the index mapping version. In this example the
.inference
index mappings have been updated in the new version. Auto creating the index in a mixed cluster by writing to it sometimes fails with the message:The message indicates the action is intentional but the failure does not occur every time, sometimes the index is created and other times it is not. If it is intentional to block index creation in a mixed cluster it should do so every time.
SystemIndexMappingUpdateService
will update the mappings for the index once all nodes in the cluster have been upgraded. Given that eventually the mappings will be correct is there a reason for blocking the creation in a mixed cluster.Steps to Reproduce
Any mixed cluster test which create a system index through write to the index could be used. The easiest example I have uses the mixed cluster test suite in the Inference Plugin.
INDEX_MAPPING_VERSION
in for the inference secrets index.https://github.com/elastic/elasticsearch/blob/48ea474953fb365a65dd7f17741ed5d66b6fc269/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceSecretsIndex.java#L29
The test does not fail every time but over multiple runs it does fail reproducibly. When it fails it will fail with a message like:
Logs (if relevant)
No response