elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.56k stars 24.35k forks source link

Elasticsearch - Provide a way of doing bulk change to alias by using script #27889

Open kunisen opened 6 years ago

kunisen commented 6 years ago

Describe the feature: We can use painless to do a bulk change the dest index on the fly. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_reindex_daily_indices

POST _reindex
{
  "source": {
    "index": "metricbeat-*"
  },
  "dest": {
    "index": "metricbeat"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
  }
}

Is there a way to do the same thing to alias? By getting the alias name from ctx object and change it on the fly. I can not find a way to do that by using the current _reindex or _aliases API.

geekpete commented 6 years ago

So an example like:

myindex with myalias

being changed after reindexing to:

myindex-1 with myalias

and

myindex with myalias-old

bleskes commented 6 years ago

@kunisen the script changes the index parameter of the IndexRequest that is then being used to perform normal indexing. That index parameter can be either an index name or an alias. Do you see it not working?

@geekpete I'm not sure I 100% follow you but I think what you refer to can be done by the Aliases API - it allows you to atomically add and remove aliases on multiple indices.

henningandersen commented 5 years ago

@kunisen I verified that using aliases works fine using ctx._index = <alias-name-expression> during reindex.

@geekpete did you consider the response from @bleskes?

If the info given is not enough to close this issue, I think we need a more elaborate description of the problem before we can continue investigation.

geekpete commented 5 years ago

I think the ask is to be able to dynamically change the alias for indices within the reindex operation for each of the indices, either with an extra parameter (eg "copy_aliases" or "move_aliases") or with scripting.

It is possible if using two steps, first a reindex then second with an alias change via alias api, but probably not dynamically across a set of indices without an external script or tool like Curator.

henningandersen commented 5 years ago

@geekpete thanks for the elaboration. As I understand it the use case is that you are reindexing from a source index (or multiple source indices) into a destination index. Upon completion of that operation you would like to update an alias from pointing to source to now point to destination? Are there more complex scenarios that need to be supported by scripting instead?

This would then be a post-operation step in a separate script (done inside the reindex operation after all documents have been successfully indexed into the new index).

geekpete commented 5 years ago

The original example started with being able to dynamically change the destination index at reindex time. It starts as a simple example but can be more complex, for example if you had a single massive index with time based data, you can use a reindex to create a set of time based indices using only scripting basing the destination index on some feature of the current document like timestamp to split into daily indices.

eg:

POST _reindex
{
  "source": {
    "index": "timebased-reindex"
  },
  "dest": {
    "index": "timebased-reindex-changeme"
  },
  "script": {
    "lang": "painless",
    "source": """
      DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); 
      DateFormat ndf = new SimpleDateFormat("yyyy.MM.dd");
      ctx._index = ('timebased-reindex-' + ndf.format(df.parse(ctx._source['@timestamp'])));
    """
  }
}

Though scripting in this way hinges on the document meta fields and only affects each document on the fly. There's also no alias meta field in the documents to be able to use for alias scripting since aliases are at index level, so adjusting aliases would need to happen after/separately as you say.

I'm trying to think of a more complex scenario around aliases and reindex that might be wanted that would need anything more than adjusting alias per index at the end and all I can think of is if you wanted to do some conditional logic with the alias updating based on index detail or metadata perhaps to allow dynamic adjustment of aliases depending on how each index looks.

henningandersen commented 5 years ago

Would it be an option to allow scripting on the aliases API? This would avoid the post action on reindex and allow flexible alias updating everywhere.

Adding a post action to reindex opens up a number of questions/worries:

  1. Other API calls that could also benefit from a post action? Maybe update by query?
  2. What if the reindex completes but the post action fails? Would complicate the response somewhat and introduce a need to just do the alias update anyway (to avoid running the full reindex again).
  3. Someone would use a reindex with a query with no results to just get the alias script update functionality.
  4. Other types of post actions on reindex (or other APIs) like closing/freezing original indices

ILM is not my strong side (yet) but it could sound like there is an overlap to ILM maybe?

geekpete commented 5 years ago

Using curator to achieve isn't going to be workable since the alias action only allows specific alias names not dynamic alias names to be matched and used. There should be a feature request for this though and I'll link that once I find it.

The next best thing would probably be a client script, I'll see if I can rig one as an example.

A test example might be trying to dynamically rename an alias based on a pattern:

before:

index, alias
log1-2019.01, log-2019.01
log2-2019.01, log-2019.01
log1-2019.02, log-2019.02
log2-2019.02, log-2019.02
log1-2019.03. log-2019.03
log2-2019.03, log-2019.03

after:

log1-2019.01, log-2019.01-dummy
log2-2019.01, log-2019.01-dummy
log1-2019.02, log-2019.02-dummy
log2-2019.02, log-2019.02-dummy
log1-2019.03, log-2019.03-dummy
log2-2019.03, log-2019.03-dummy

or put more simply, change the alias log-<period> to log-<period>-dummy for all aliases that match