elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.53k stars 24.61k forks source link

Add support for remote indices in index alias (use CCS) #43312

Open ruflin opened 5 years ago

ruflin commented 5 years ago

An index alias can contain one or multiple indices. But today it's not possible to add an index or an index pattern of a remote cluster. It would be nice to add support for this as it would allow to expose cross cluster search capabilities through an alias. One use case here is for example if logs exists on multiple clusters, on a central cluster all the remote clusters could be set up. Instead of having to add each index during query time and require knowledge about the names of the remote cluster, a simple alias logs could be created that is kept up-to-date with the remotes. The user querying the data would only query logs and get all the results.

Example

I setup a cluster foo and a cluster bar. On the cluster bar I configured the cluster foo as a remote cluster. Both have an index test. When running the following query I get the results from both indices:

http://localhost:9201/foo:test,test/_search

Now instead I would like to run a query on an alias, so I try to create an alias with both indices:

curl -X POST "localhost:9201/_aliases" -H 'Content-Type: application/json' -d'
{
    "actions" : [
        { "add" : { "indices" : ["foo:test", "test"], "alias" : "logs" } }
    ]
}
'

Running the above alias addition returns the following error:

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [foo:test]","resource.type":"index_or_alias","resource.id":"foo:test","index_uuid":"_na_","index":"foo:test"}],"type":"index_not_found_exception","reason":"no such index [foo:test]","resource.type":"index_or_alias","resource.id":"foo:test","index_uuid":"_na_","index":"foo:test"},"status":404}

It would be great to have support for the above feature.

elasticmachine commented 5 years ago

Pinging @elastic/es-core-features

gwbrown commented 5 years ago

We discussed this in the core/features sync today.

We agree that this feature would be very useful. However, it is currently not possible to implement without a significant re-architecture of how aliases are implemented.

Currently, aliases are defined as part of the metadata for an index - there is no way for an alias to exist apart from a concrete index. Because we don't keep index metadata locally for remote indices, there is no index metadata to attach alias information to.

Creating a separate representation of aliases would provide a number of benefits, including the ability to implement features like this easily, and it would simplify a number of things in ILM as well to be able to create an alias without necessarily creating the underlying concrete index at the same time. However, this would involve a large amount of reworking and refactoring existing code. While we would like to do this, even if we started today, completing it would take quite some time and impact a very wide portion of the code base. As such, we're going to leave this issue open, but tag it high hanging fruit.

monfera commented 4 years ago

Out of curiosity, is the fact that aliases are part of the (metadata for an) index due to deeper technical reasons? We often hear about good practices eg. decoupling, or avoiding coupling in the first place. The analog term would be denormalization, where an entity like alias lives on its own, merely referencing the constituent indices. (Btw. I'm already lost on the notion of the alias living as part of the index metadata, as one alias can cover multiple indices). So is it more of a historical consequence, or would a denormalized approach have been much more challenging to implement than the coupled one, ie. a good stepping stone anyway?

gwbrown commented 4 years ago

Largely historical, rather than any inherent difficulty. Aliases are one of the oldest ES features - they're in v0.9 at least (and probably before that) - and have significant historical baggage. There's some related conversation in https://github.com/elastic/elasticsearch/issues/37880 (and the older issue linked from there) that may provide some insight into why this would be a difficult change.