elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.5k stars 24.89k forks source link

Immediately upgrading a downgraded tsdb data stream fails #96163

Open martijnvg opened 1 year ago

martijnvg commented 1 year ago

Immediately upgrading a data stream to tsdb after is has been downgraded from a tsdb data stream fails the execute. This is because there already exists a tsdb backing index and the rollover doesn't detects that, because the data stream is non tsdb.

Note that after waiting ~4hrs the rollover should succeed.

Reproduction:

PUT _index_template/1
{
  "index_patterns": [
    "test*"
  ],
  "template": {
    "settings": {
      "index": {
        "mode": "time_series"
      }
    },
    "mappings": {
      "properties": {
          "my_field": {
              "time_series_dimension": true,
              "type": "keyword"
          }
      }
    }
  },
  "data_stream": {}
}

POST test1/_doc
{
  "@timestamp": "2023-05-16T11:49:50.599Z",
  "my_field": "value"
}

PUT _index_template/1
{
    "index_patterns": [
        "test*"
    ],
    "template": {
        "settings": {
            "index": {
                "mode": null
            }
        },
        "mappings": {
            "properties": {
                "my_field": {
                    "time_series_dimension": true,
                    "type": "keyword"
                }
            }
        }
    },
    "data_stream": {}
}

POST test1/_rollover

PUT _index_template/1
{
    "index_patterns": [
        "test*"
    ],
    "template": {
        "settings": {
            "index": {
                "mode": "time_series"
            }
        },
        "mappings": {
            "properties": {
                "my_field": {
                    "time_series_dimension": true,
                    "type": "keyword"
                }
            }
        }
    },
    "data_stream": {}
}

POST test1/_rollover
elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

mlunadia commented 1 year ago

@martijnvg what is the recommended cooling time between changing index for seeing no rollover issues? Do you think this and the manual change mechanism should be documented by Kibana or ES?

martijnvg commented 1 year ago

what is the recommended cooling time between changing index for seeing no rollover issues?

I think by default the recommended cooling time should be 4 hours. But this could be less if downgrading happened some time later after the last tsdb rollover.

But this also depends on whether a custom index.look_ahead_time has been set. This default to 2 hours. The first backing index will have start time of now - look_ahead_time and end time of now + look_ahead_time.

Do you think this and the manual change mechanism should be documented by Kibana or ES?

I don't think we have a documented upgrading to and downgrading from tsdb. I think Elasticsearch should docs around this, but I think Kibana too (a minimised version of it).

felixbarny commented 1 year ago

Is there a way for ES to automatically adjust the end time to the max @timestamp when rolling over a data stream? That would eliminate the issue assuming that the actual timestamps in that index are lower than the current time. Alternatively, could ES create a new backing index that has a start time that's higher than the existing backing indices' end time?

martijnvg commented 1 year ago

Is there a way for ES to automatically adjust the end time to the max @timestamp when rolling over a data stream?

In the context of the rollover operation the information required to update the index.time_series.end_time isn't available. Maybe on a downgraded data stream we could trim the index.time_series.end_time index setting based on the highest @timestamp in the backing index. But this would need to be done in a separate api call. This api doesn't exist today.

Alternatively, could ES create a new backing index that has a start time that's higher than the existing backing indices' end time?

Yes, but that index could be up to 4 hours in the future and will not end up getting used. Meanwhile current writes will go to the older tsdb backing index. And my concern is that if this downgrade and upgrade cycle happens again then we end up with another tsdb backing index but then up to 8 hours in the future.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-docs (Team:Docs)

martijnvg commented 1 year ago

This issue was discussed in yesterday's tsdb integration sync. The fact that due to how downgrading from tsdb and upgrading to tsdb works causes this bug isn't ideal, but isn't something that we will address. This is because immediately upgrading a downgrade data stream to tsdb isn't a use case we need to support. It is okay if there is some wait time before again upgrading to tsdb.

We do need to document this as part of upgrading to tsdb and downgrading from tsdb.

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

nchaulet commented 1 month ago

It happens in Fleet we have to rollback our integrations and Fleet can trigger automatic upgrades and having to wait n hours to be able to upgrade again because of that behaviour, it is not optimal, both for Fleet and for our users, should/could this be automatically handled by elasticsearch?

martijnvg commented 1 month ago

It happens in Fleet we have to rollback our integrations and Fleet can trigger automatic upgrades

Our assumption was that rollbacks should occur rarely. Typically an integration's template / mapping has to be modified in order to be ready for tsdb. After testing, the chance of rolling back should be small. Unless there is some unforeseen bug or the tradeoffs that come with tsdb don't work out well. In that case the second upgrade to tsdb could be days / weeks after the rollback.

and having to wait n hours to be able to upgrade again because of that behaviour,

On recent versions, the wait time is lower now. If index.look_back_time setting is set to 1 minute that migrating to tsdb after a rollback can occur as soon as 31 minutes after rollback.

it is not optimal, both for Fleet and for our users, should/could this be automatically handled by elasticsearch?

This is something we can address, but it had always lower priority over other work. This mainly was based on the fact that we assumed that upgrading minutes to hours after a rollback isn't a common scenario.