elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
927 stars 24.82k forks source link

Automatically add created, last_changed and changed_by metadata for index templates, component templates and ingest pipelines #108754

Open flash1293 opened 5 months ago

flash1293 commented 5 months ago

Description

Related to https://github.com/elastic/elasticsearch/issues/108469

To be able to troubleshoot ingestion issues, it's helpful to know what changed around the time problems started. A full version control system of ES objects is an aspirational goal here, but as a low hanging fruit the meta information about when an ES object was created, last changed and by whom would give a lot of the value without big investments.

Scope

For index templates, component templates and ingest pipelines, track created_at, modified_at and modified_by and return as part of the respective APIs to retrieve these objects:

GET _ingest/pipeline/my-pipeline

{
  "my-pipeline": {
    "processors": [
      ...
    ],
    "_meta": {
     ...
    },
   "created_at": "2025-05-05T00:00:00",
   "modified_at": "2025-05-05T00:00:00",
   "modified_by": "user_xyz",
  }
}
GET _component_template/my-template

{
  "component_templates": [
    {
      "name": "my-template",
      "component_template": {
        "template": {
          ... 
        },
        "version": 3,
        "_meta": {
          ...
        },
        "created_at": "2025-05-05T00:00:00",
        "modified_at": "2025-05-05T00:00:00",
        "modified_by": "user_xyz",
      }
    }
  ]
}
GET _index_template/my-template

{
  "index_templates": [
    {
      "name": "my-template",
      "index_template": {
        ...,
        "created_at": "2025-05-05T00:00:00",
        "modified_at": "2025-05-05T00:00:00",
        "modified_by": "user_xyz",
      }
    }
  ]
}

It's not possible to write these properties.

Considerations

Merge with _meta

The existing _meta object is user-controlled. An alternative approach would be to add these pieces of information to this object and override whatever the user set manually. While cleaner in the sense of not introducing more properties on these objects, this has some downsides:

Permissions

This approach might leak usernames to other users that normally wouldn't have access to them. This could be counteracted by only returning modified_by if the user has the required permissions to read user data in the first place.

ruflin commented 5 months ago

Eventually we should also find ways to query this data: Give me the ingest pipelines that changed in the last 24h

flash1293 commented 5 months ago

@ruflin that's a good thought, seems like a follow-up issue to me.

elasticsearchmachine commented 5 months ago

Pinging @elastic/es-data-management (Team:Data Management)

dakrone commented 5 months ago

While I wish we could have a standardization here, we do have a field like this for ILM policies already: modified_date. If we want to add these, it would be good to standardize on exactly what names we want.

Is there a reason you prepended an underscore to the names? I don't think we would necessarily have to do that.

ruflin commented 5 months ago

++ on standardisation.

The underscore is to indicate this is something created by the system and can't be modified by the user. It is likely also to prevent conflicts with user defined fields but agree, we might not need this.

flash1293 commented 5 months ago

I don't feel strongly about the _, it was mostly just a starting point. I edited the issue to align with existing names