elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.54k stars 24.35k forks source link

`Coordinator` sends itself an incomprehensible join validation message #110368

Open DaveCTurner opened 3 days ago

DaveCTurner commented 3 days ago

Seen in a test log:

[2024-07-02T05:02:13,644][WARN ][o.e.c.c.Coordinator      ] [node_s7] failed to validate incoming join request from node [{node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000}]
org.elasticsearch.transport.RemoteTransportException: [node_s7][127.0.0.1:13698][internal:cluster/coordination/join/validate]
Caused by: java.lang.ClassCastException: class org.elasticsearch.transport.BytesTransportRequest cannot be cast to class org.elasticsearch.cluster.coordination.ValidateJoinRequest (org.elasticsearch.transport.BytesTransportRequest and org.elasticsearch.cluster.coordination.ValidateJoinRequest are in unnamed module of loader 'app')
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75) ~[main/:?]
    at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:1084) ~[main/:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[main/:?]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[main/:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1570) ~[?:?]
[2024-07-02T05:02:13,645][INFO ][o.e.a.s.m.TransportMasterNodeActionIT] [testRoutingLoopProtection] [TransportMasterNodeActionIT#testRoutingLoopProtection]: cleaning up after test
[2024-07-02T05:02:13,648][INFO ][o.e.c.c.JoinHelper       ] [node_s7] failed to join {node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000} with JoinRequest{sourceNode={node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000}, compatibilityVersions=CompatibilityVersions[transportVersion=8701000, systemIndexMappingsVersion={.synonyms-2=MappingsVersion[version=1, hash=-888080772], .tasks=MappingsVersion[version=0, hash=-945584329]}], features=[mapper.track_ignored_source, mapper.vectors.int4_quantization, unified_highlighter_matched_fields, retrievers_supported, mapper.source.synthetic_source_fallback, mapper.index_sorting_on_nested, mapper.pass_through_priority, health.extended_repository_indicator, script.hamming, rest.capabilities_action, features_supported, file_settings, mapper.vectors.bit_vectors, search.vectors.k_param_supported, stats.include_disk_thresholds, mapper.range.null_values_off_by_one_fix, desired_node.version_deprecated, knn_retriever_supported, standard_retriever_supported], minimumTerm=1, optionalJoin=Optional[Join[votingNode={node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000}, masterCandidateNode={node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000}, term=2, lastAcceptedTerm=1, lastAcceptedVersion=14]]}
org.elasticsearch.transport.RemoteTransportException: [node_s7][127.0.0.1:13698][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a join validation request from [{node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000}] to [{node_s7}{d03a3GJWQJG8xa2vsheGQw}{qxWL0ODeR0uaqbo-OL7lFw}{node_s7}{127.0.0.1}{127.0.0.1:13698}{m}{8.15.0}{7000099-8512000}]
    at org.elasticsearch.cluster.coordination.Coordinator.lambda$sendJoinValidate$14(Coordinator.java:745) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.acceptException(ActionListenerImplementations.java:186) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onFailure(ActionListenerImplementations.java:191) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73) ~[main/:?]
    at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73) ~[main/:?]
    at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations$RunAfterActionListener.onFailure(ActionListenerImplementations.java:278) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62) ~[main/:?]
    at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73) ~[main/:?]
    at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:402) ~[main/:?]
    at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:53) ~[main/:?]
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1490) ~[main/:?]
    at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1624) ~[main/:?]
    at org.elasticsearch.transport.TransportService$DirectResponseChannel$2.doRun(TransportService.java:1604) ~[main/:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[main/:?]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[main/:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1570) ~[?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s7][127.0.0.1:13698][internal:cluster/coordination/join/validate]
Caused by: java.lang.ClassCastException: class org.elasticsearch.transport.BytesTransportRequest cannot be cast to class org.elasticsearch.cluster.coordination.ValidateJoinRequest (org.elasticsearch.transport.BytesTransportRequest and org.elasticsearch.cluster.coordination.ValidateJoinRequest are in unnamed module of loader 'app')
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75) ~[main/:?]
    at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:1084) ~[main/:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[main/:?]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[main/:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1570) ~[?:?]

This is fairly benign, we join the cluster anyway, but still there's no need to send ourselves a join validation request and this log noise is somewhat concerning:

diff --git a/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java b/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
index 2f604f1b959..1abcc502768 100644
--- a/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
+++ b/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
@@ -722,7 +722,9 @@ public class Coordinator extends AbstractLifecycleComponent implements ClusterSt
                         stateForJoinValidation.getNodes().getMinNodeVersion()
                     );
                 }
-                sendJoinValidate(joinRequest.getSourceNode(), listeners.acquire());
+                if (joinRequest.getSourceNode().getId().equals(getLocalNode().getId()) == false) {
+                    sendJoinValidate(joinRequest.getSourceNode(), listeners.acquire());
+                }
                 return null;
             });
elasticsearchmachine commented 3 days ago

Pinging @elastic/es-distributed (Team:Distributed)