If user reindexes sample data from 6.8.x in 7.16 and then tries to install them in a new space Kibana displays internal server error

bhavyarm commented 2 years ago

Kibana version: 7.16.0 snapshot Dec 1st

Elasticsearch version: 7.16.0 snapshot Dec 1st

Server OS version: darwin_x86_64

Browser version: chrome latest

Browser OS version: OS X

Original install method (e.g. download page, yum, from source, etc.): from snapshots

Describe the bug: If user has sample data installed in 6.8.x and upgrades to 7.16. and reindexes 6.8 indices and then creates a new space and tries to install sample data - Kibana displays internal server error.

Steps to reproduce:

Create sample data in 6.8.20 - upgrade to 7.16
Go to upgrade assistant - reindex sample data
Create a new space and click on add integrations -> sample data -> try to install sample data - Kibana fails to install them with an internal server error because of
```
error: invalid_index_name_exception: [invalid_index_name_exception] Reason: Invalid index name [kibana_sample_data_ecommerce], already exists as alias
```

Screenshots (if relevant):

Errors in browser console (if relevant):

Failed to load resource: the server responded with a status of 500 (Internal Server Error) 
:5601/s/space_x/api/sample_data/ecommerce:1

Provide logs and/or server output (if relevant):

Kibana logs:

error  [11:22:18.074]  Error: Internal Server Error
    at HapiResponseAdapter.toInternalError (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/core/server/http/router/response_adapter.js:61:19)
    at Router.handle (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/core/server/http/router/router.js:172:34)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at handler (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/core/server/http/router/router.js:124:50)
    at exports.Manager.execute (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/request.js:281:9)
  log   [11:40:28.319] [info][0][1][endpoint:metadata-check-transforms-task:0][plugins][securitySolution] no endpoint metadata transforms found
  log   [12:28:23.154] [warning][home][plugins][sampleData] Unable to create sample data index "kibana_sample_data_ecommerce", error: invalid_index_name_exception: [invalid_index_name_exception] Reason: Invalid index name [kibana_sample_data_ecommerce], already exists as alias
  log   [12:28:23.154] [error][http] Error: options.statusCode is expected to be set. given options: undefined
    at Object.customError (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/core/server/http/router/response.js:129:13)
    at /Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/plugins/home/server/services/sample_data/routes/install.js:117:20
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at Router.handle (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/core/server/http/router/router.js:163:30)
    at handler (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/src/core/server/http/router/router.js:124:50)
    at exports.Manager.execute (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/bhavyarajumandya/Desktop/7.16_snapshot_upgrade/No_error_upgrade/kibana-7.16.0-SNAPSHOT-darwin-x86_64/node_modules/@hapi/hapi/lib/request.js:281:9)
 error  [12:28:23.140]  Error: Internal Server Error

elasticmachine commented 2 years ago

Pinging @elastic/kibana-core (Team:Core)

pgayvallet commented 2 years ago

Go to upgrade assistant - reindex sample data

@bhavyarm is this step required to see the sample data on the 7.x instance? What happens if going directly from step 1 to step 3?

@elastic/kibana-stack-management what exactly does this action do?

cjcenizal commented 2 years ago

@pgayvallet Reindexing indices from 6.x will remove these deprecated settings from them:

https://github.com/elastic/kibana/blob/ded75bdb7ba22c8bb3271594c80a5c88f2332a66/x-pack/plugins/upgrade_assistant/server/lib/reindexing/index_settings.ts#L155-L211

cjcenizal commented 2 years ago

Who owns the sample data? It seems like there's a bug in the logic because in @bhavyarm's screenshot, the "Add data" buttons are active:

If the sample data was installed in 6.8, then these buttons should be replaced with "Remove data" buttons. I think an engineer from the team that owns the sample data needs to take a look at this logic and identify the root cause of this unexpected behavior.

bhavyarm commented 2 years ago

@cjcenizal we are allowed to install sample data in every space. Please note this is a new space I created in 7.16.

@pgayvallet if user upgrades to 7.16 from 6.8.x and creates a new space and installs sample data without touching upgrade assistant - everything works fine. But if user goes to upgrade assistant reindexes sample data and then creates a new space and tries to install sample data things break

These log lines are the problem.

log   [12:28:23.154] [warning][home][plugins][sampleData] Unable to create sample data index "kibana_sample_data_ecommerce", error: invalid_index_name_exception: [invalid_index_name_exception] Reason: Invalid index name [kibana_sample_data_ecommerce], already exists as alias

cjcenizal commented 2 years ago

Oh I missed the part about creating a new space! Thanks for pointing that out.

pgayvallet commented 2 years ago

if user upgrades to 7.16 from 6.8.x and creates a new space and installs sample data without touching upgrade assistant - everything works fine. But if user goes to upgrade assistant reindexes sample data and then creates a new space and tries to install sample data things break

@bhavyarm In the scenario where the user did not use UA to reindex in 7.16, he still did install the sample data in the default space in 6.8, right? Just want to be sure, because given @cjcenizal's answer, if the UA reindex action just performs an update of the index's setting, the behavior should be the same with or without reindex from UA.

Given the error Reason: Invalid index name [kibana_sample_data_ecommerce], already exists as alias, it makes me wonder if we weren't using an alias for these sample data in 6.x. @cjcenizal, can you please just (double) confirm that the UA's reindex action does not do fancy things on moving/creating aliases?

Also may be related to https://github.com/elastic/kibana/issues/116677

yuliacech commented 2 years ago

@pgayvallet @cjcenizal The reindexing action creates an alias to point from the old index to the new index. Here is an example of the process for an index test_data:

Set index test_data to read only
Create new index reindexed-v6.8-test_data
Reindex documents from test_data to reindexed-v6.8-test_data
Create alias test_data that points to reindexed-v6.8-test_data
Delete index test_data

bhavyarm commented 2 years ago

@bhavyarm In the scenario where the user did not use UA to reindex in 7.16, he still did install the sample data in the default space in 6.8, right?

@pgayvallet yes.

pgayvallet commented 2 years ago

The reindexing action creates an alias to point from the old index to the new index. Here is an example of the process for an index test_data

So at the end of the reindex operation, the test_data index is deleted, and there is a new test_data alias pointing to reindexed-v6.8-test_data. This is some disruptive action, that any consumer of the index has to be aware of and take into account when performing operations.

This causes the uninstall/reinstall logic for the sample data sets to fail: the indices.delete operation fails (silently, sigh..) because the target is an alias, and then the indices.create operation fails because an alias already exists with the same name.

https://github.com/elastic/kibana/blob/61ab4a3c59e062ac440a8c7bfcaa089cb32d04d4/src/plugins/home/server/services/sample_data/routes/install.ts#L106-L122

We could eventually add additional logic to check if the target index is an alias first, and then delete both the alias and its target, but to be honest, that doesn't feel right. I don't think any upgrade-related operation should alter the state of indices in such a significant way. After a reindex, I would expect to have an index with the same name, not an alias pointing to another index. This is dangerous for any index-based operations (such as deletes).

cjcenizal commented 2 years ago

After a reindex, I would expect to have an index with the same name, not an alias pointing to another index

I tend to agree with you, though this runs up against the current Elasticsearch model. Renaming an index in place isn't possible today, AFAIK. From what I can tell the prescribed method of renaming is to use an alias, or to do a full reindex.

We should verify that a second full reindex wouldn't be feasible for UA, but assuming that it isn't, perhaps the optimal path is to add that additional logic to Sample Data. This is the first time I believe we've encountered this problem, which I do find a bit surprising but there it is.

bhavyarm commented 2 years ago

cc @LeeDr

pgayvallet commented 2 years ago

We should verify that a second full reindex wouldn't be feasible for UA, but assuming that it isn't, perhaps the optimal path is to add that additional logic to Sample Data

If this is too much work, or just isn't possible to do from UA, I agree that we'll 'just' have to do it from the home plugin. I just hate the idea to add additional bullet-proofing logic to protect against changes in the indices/aliases that aren't performed from the owning plugin. I mean, what proof do we have that these sample data indices are the only ones impacted by this problem? I doubt these are the only data indices that can be created/deleted 'internally' by Kibana?

This is the first time I believe we've encountered this problem, which I do find a bit surprising but there it is.

I agree, this is surprising 😅

cjcenizal commented 2 years ago

@sebelga Could you please investigate this? I see two main questions we want to answer.

Is a second full reindex feasible?

To conclude Upgrade Assistant's reindex process with an updated index that has the same name as the original, we'd need to reindex the update index into a new index. We wouldn't need to make any changes to the destination index other than giving it a new name. What's the performance impact, and how does this performance change as the size of the index grows? If the performance impact is O(1) or O(log N) then a second full reindex is feasible. Otherwise, I'd say it isn't.

Where else might this problem manifest?

If Elastic has more logic that's similar to the Sample Data, in which datasets are installed and later detected through the presence of indices, then we'll probably trip over this again. For example, do any Integrations install datasets? If Upgrade Assistant reindexes those datasets, will that break their logic for detecting installed Integrations, uninstalling Integrations, or migrating them during version upgrades?

sebelga commented 2 years ago

Is a second full reindex feasible?

Talking with @martijnvg it does seem that a bigger index will not linearly increase the time of reindex and is not an O(1) op. As per his comment:

"_in the hot threads a significant time is spent on looking up the id/version during reindexing. This is something that reindex does to ensure that no duplicates are indexed into the target index. But as the destination index grows so does the cost/time it takes to lookup this _id/version_".

Where else might this problem manifest?

It seems that the issue we are discovering could occur on any dataset, not only the ones from our products/integrations, but also the ones from our users.

cjcenizal commented 2 years ago

Seb and I chatted and we agree that the ideal solution would be to rename the index in place (https://github.com/elastic/elasticsearch/issues/37880). Unfortunately, the ES Data Management team tells me it's not possible to implement this for 7.17.

It's also not feasible to add a second reindexing phase as I originally proposed, because we're still stuck with the problem of having to create an index with a name that collides with an index alias. The ES Data Management team also says it's not feasible to provide an API option to force this.

As an outcome, we're left with three steps:

We'll update the Upgrade Assistant UI to clarify the changes being made to indices (https://github.com/elastic/kibana/issues/121220)
We'll patch the sample data logic to check for the original index or an alias that points to the original index.
We'll notify other Kibana contributors that deletion of indices isn't proxied by their aliases, so anywhere that our code deletes an index we should also check to see if it's aliased. This is the guidance provided by the ES Data Management team.

sebelga commented 2 years ago

It's also not feasible to add a second reindexing phase as I originally proposed, because we're still stuck with the problem of having to create an index with a name that collides with an index alias.

If we add a second reindexing phase we don't need to create an alias, right?

Adding a second reindexing phase and keeping an index (instead of using an alias) seems the least intrusive change for our users. It would initially be less performant than a "rename in place API" but we'll update and use that API once it is ready. Thoughts @cjcenizal ?

cjcenizal commented 2 years ago

We discussed this sync, so I'll record what we discussed here for visibility.

If we add a second reindexing phase we don't need to create an alias, right?

Right.

Adding a second reindexing phase and keeping an index (instead of using an alias) seems the least intrusive change for our users. It would initially be less performant than a "rename in place API" but we'll update and use that API once it is ready. Thoughts @cjcenizal ?

Agreed. The main concern is cost, since reindexing incurs data transfer costs.

elastic / kibana

If user reindexes sample data from 6.8.x in 7.16 and then tries to install them in a new space Kibana displays internal server error #120137

Is a second full reindex feasible?

Where else might this problem manifest?