elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.2k stars 24.85k forks source link

Upgrade from 7.x to 8.x fails if 7.x node didn't fully start up #109544

Open DaveCTurner opened 5 months ago

DaveCTurner commented 5 months ago

If you create a 7.17.x node, kill it at exactly the wrong time during startup, and then try to upgrade it to 8.x, the upgrade will fail with the following cryptic message:

[2024-06-10T15:30:23,169][ERROR][o.e.b.Elasticsearch      ] [node-0] fatal exception while booting Elasticsearch
org.elasticsearch.ElasticsearchException: Failed to bind service
    at org.elasticsearch.node.NodeConstruction.prepareConstruction(NodeConstruction.java:276) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.node.Node.<init>(Node.java:192) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:237) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:237) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:74) ~[elasticsearch-8.13.2.jar:?]
Caused by: org.elasticsearch.gateway.CorruptStateException: Format version is not supported. Upgrading to [8.13.2] is only supported from version [7.17.0].
    at org.elasticsearch.env.NodeEnvironment.checkForIndexCompatibility(NodeEnvironment.java:517) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.env.NodeEnvironment.upgradeLegacyNodeFolders(NodeEnvironment.java:416) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:309) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.node.NodeConstruction.validateSettings(NodeConstruction.java:500) ~[elasticsearch-8.13.2.jar:?]
    at org.elasticsearch.node.NodeConstruction.prepareConstruction(NodeConstruction.java:255) ~[elasticsearch-8.13.2.jar:?]
    ... 4 more

The issue here is that we create ${path.data}/nodes/0/node.lock first, and then write the node metadata in ${path.data}/nodes/0/_state/node-${N}.st, but if the node is killed before this second step then the data path is nonempty but incomprehensible. We should treat a data path containing only a node.lock file as if it were empty and start up normally.

elasticsearchmachine commented 5 months ago

Pinging @elastic/es-core-infra (Team:Core/Infra)