elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

Remote clusters skip_unavailable and transport.ping_schedule are not compatible each other #88080

Open lucabelluccini opened 2 years ago

lucabelluccini commented 2 years ago

Elasticsearch Version

7.16.1+

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

After 7.16.1 (https://github.com/elastic/elasticsearch/pull/80589).

Defining a remote cluster with both skip_unavailable and transport.ping_schedule makes the searches on a not available remote cluster behave like skip_unavailable is set to false.

Documentation doesn't warn about this: https://www.elastic.co/guide/en/elasticsearch/reference/master/remote-clusters-settings.html#remote-clusters-settings

This behavior should be at least documented.

Steps to Reproduce

1) Create a cluster A 2) Create a cluster B 3) Add to the cluster A:

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote" : {
        "other" : {
          "mode" : "proxy",
          "skip_unavailable" : "true",
          "server_name" : "clusterB",
          "proxy_address" : "clusterB:9400"
          "transport" : {
            "ping_schedule" : "30s"
          }
        }
      }
    }
  }
}

4) Shut down or disconnect cluster B from cluster A 5) Run the request GET other:*/_search. You'll get org.elasticsearch.transport.RemoteTransportException: [error while communicating with remote cluster [other]]. 6) Unset transport.ping_schedule:

PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.other.transport.ping_schedule" : null
  }
}

7) Retry GET other:*/_search. The search will work without exceptions

Logs (if relevant)

No response

elasticmachine commented 2 years ago

Pinging @elastic/es-docs (Team:Docs)

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)

tlrx commented 2 years ago

I (very quickly) looked a this and I think it should be fixed rather than documented