elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.13k forks source link

[Lens] commonMigrateIndexPatternDatasource migration some times fails during upgrade #159965

Closed stratoula closed 5 months ago

stratoula commented 1 year ago

Kibana version: 8.6+

Describe the bug: For some reasons in specific dashboards the commonMigrateIndexPatternDatasource migration which migrates the indexpattern datasource to the formBased fails to run. As a result the Lens panels fail to load

We haven't found why this is happening. The migration is very simple, it populates a new formBased field to the datasourceStates object. We haven't managed to replicate it but we have customers that have encountered that.

Errors in browser console (if relevant): The more usual error is

image

Any additional context: To fix it:

elasticmachine commented 1 year ago

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

stratoula commented 1 year ago

I ran the migration for a dashboard that was reported as broken and it run successfully so I don't think that there is a problem on the migration itself. Also this problem doesn't happen always. Furthermore I am not sure that it is this specific migration that doesn't run. Possibly the migration fails somewhere else but this is the most disrupting change so this is the only problem that is visible. It is very possible that other migrations don't run also.

@elastic/kibana-core do you have any idea why the migrations might not run in some cases?

pgayvallet commented 1 year ago

commonMigrateIndexPatternDatasource is an embeddable migration right, given I couldn't find it directly in the dashboard migrations?

Which version is it registered for?

Looking at the associated support issue, and the faulty dashboard SO:

{
  // ....
  "coreMigrationVersion": "8.8.0",
  "typeMigrationVersion": "8.7.0",
}

Possibly the migration fails somewhere else but this is the most disrupting change so this is the only problem that is visible

Any error during document migration would have caused the migration to fail. But also, it wouldn't have updated the typeMigrationVersion or coreMigrationVersion of the document.

Except, of course, if the registered type migration fails silently, e.g with a try / catch / trap pattern

do you have any idea why the migrations might not run in some cases?

As said, the first option coming to mind would be the registered migration failing silently.

~Given it's an embeddable migration, deferred migrations implemented in #153117 also comes to mind. But AFAIK this is not yet used in 8.8 so I doubt it could be it.~

EDIT: nevermind, https://github.com/elastic/kibana/pull/153117 was merged in 8.9.0, can't be the culprit.

@rudolf any other idea?

stratoula commented 1 year ago

It was registered in 8.6.0, here it is https://github.com/elastic/kibana/blob/main/x-pack/plugins/lens/server/migrations/saved_object_migrations.ts#L592 the migrateIndexPatternDatasource but as I said it is the most disruptive one (as if it doesn't run the Lens SOs are broken)

It runs both for embedded visualizations but also for Lens SOs. (as all our migrations)

This specific one doesn't have a try catch and is a very simple one. No we still using the server migrations that run on kibana upgrade.

pgayvallet commented 1 year ago

It runs both for embedded visualizations but also for Lens SOs. (as all our migrations)

But the problem we observed is only about embedded vis, right? Or did they also report the problem on "root" visualizations?

stratoula commented 1 year ago

@pgayvallet correct, so far three customers have reported to us the same failure and all have reported dashboards with by value visualizations (no reference to Lens SO, everything is saved on the dashboard SO)

pgayvallet commented 1 year ago

so far three customers have reported to us the same failure

I was only aware of one of the support issue. Would you mind backlinking from the sdhs to this issue so we can track them?

The first thing I would check is: which versions did those customers upgraded from and to? If all those were reported during upgrades to 8.8.0, we might indeed have an issue somewhere, either in the persistable state migration system or in the SO migration itself.

stratoula commented 1 year ago

@pgayvallet I pinged all the others, 4 in total. The latest is an upgrade to 8.8 but the other 3 are related to upgrades to 8.6.x They upgraded from different 8.x versions.

pgayvallet commented 1 year ago

Great, thanks for the backlinks.

So the problem seems to occur for upgrades to any version including this commonMigrateIndexPatternDatasource migration (8.6.0+).

I took a brief look just to confirm, but we didn't perform any structural (or just any fwiw) changes on the document migrator during this, or prior minor, versions.

This, in addition to the fact that faulty documents have the correct migrationVersion / typeMigrationVersion property version set, and that there were no similar case reported for "real SO" (non-embedded) visualization makes me suspect it might be a problem in the way the persistable state migration works?

Could it be that error in chained migration are shallowed in any way? Could a failure in one of the "composite" migration silently trap the error, while not running the following migrations of the composite? It could lead to such state where the document's migrationVersion is set, but the embeddable state not fully migrated?

Who would be the best person(s) to answer this?

stratoula commented 1 year ago

I see your points Pierre. I am pinging here @ThomThomson as they own the by value migrations on the dashboard and he might have some insights on how these migrations work.

pgayvallet commented 1 year ago

Some observations we made with @stratoula this morning:

xPB12 commented 1 year ago

Hello,

We are one of elastic's customers who ran into the issue and raised it with support. I can confirm that we had 2 dashboards using all lens visualisations (no reference to Lens SO, everything is saved on the dashboard SO). One of the dashboard migrated fine when we upgraded from v7.17.10 to v8.8.0 but the other ran into problem.

Both the dashboards use similar lens visualisations (similar aggregations) but on different data sources. Let me know if you need any further details.

pgayvallet commented 1 year ago

@xPB12 thanks for the information, it would confirm that the migration outcome can be different per document on a given cluster.

Let me know if you need any further details.

it would be very useful to us if you could provide:

(by just exporting them using the management/export feature)

It would help us trying to find differences that could explain the different outcome during the migration.

You can use the following es query for that:

GET /.kibana_7.17.10_001/_doc/dashboard:{ID-OF-THE-DASHBOARD}

Do the dashboard definitions contain anything sensitive? Are you fine sharing them on this issue or should we go though support or another mechanism to transfer them?

stratoula commented 1 year ago

I checked the migrations on the dashboard ndjson provided by the customer and I can say with certainty that the 8.1.0 and 8.3.0 haven't ran also. For the rest I can't confirm because they don't have embeddables to satisfy the criteria. So we can say that the problem is not the 8.6 migrations but something else is going on. My assumption is that none of the migrations ran.

xPB12 commented 1 year ago

The issue was resolved for the non working dashboard by changing a key in the saved object from indexpattern to formBased. Both the working and the culprit saved object json is available in the support case #01379924.

I've also attached both the dashboard version from GET /.kibana_7.17.10_001/_doc/dashboard:{ID-OF-THE-DASHBOARD} query as well to the case.

stratoula commented 1 year ago

Thanx @xPB12, I don't see a significant difference between these 2 dashboards cc @pgayvallet

We can create a fallback for this migration (so if it skipped at least the visualizations will work) but this is not the solution for this problem

The problem is that the migrations are skipped on a dashboard for by value panels for some reason that we can't identify atm. The problem is not this migration particularly but something on the migration system.

elasticmachine commented 1 year ago

Pinging @elastic/kibana-presentation (Team:Presentation)

stratoula commented 1 year ago

@xPB12 we created a fallback for the missing datasource issue that is going to be released in 8.9. So these panels are not going to break even if the migration is not going to run.

Can you do a test for us? If you import the 7.17 dashboard in 8.7 from the Saved objects management page, are the panels being depicted correctly?

dej611 commented 1 year ago

Today @delvedor showed me another possible instance of this bug, with a missing new metric migration within a dashboard.

JAndritsch commented 1 year ago

I'm running into this problem trying to upgrade from a fork of 8.2.2 to a fork of 8.8.2. I applied the same fixes in https://github.com/elastic/kibana/commit/a67d83821f5a81448246ba23f96a524509d4e932 to to my fork of 8.8.2 and can confirm the visualizations now render properly.

However, attempting to edit one of the Lens visualizations results in broken visualization form/control that displays the following stack trace:

TypeError: Cannot read properties of undefined (reading 'layers')
    at Object.uniqueLabels (https://localhost:5601/9007199254740991/bundles/plugin/lens/1.0.0/lens.chunk.4.js:19215:28)
    at LayerPanel (https://localhost:5601/9007199254740991/bundles/plugin/lens/1.0.0/lens.chunk.2.js:19215:347)
    at Gh (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225369:137)
    at rk (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225487:63)
    at qk (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225467:129)
    at pk (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225466:435)
    at ik (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225466:265)
    at Yj (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225459:163)
    at https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225334:255
    at exports.unstable_runWithPriority (https://localhost:5601/9007199254740991/bundles/kbn-ui-shared-deps-npm/kbn-ui-shared-deps-npm.dll.js:225557:343)

image

Update:

I've confirmed that replacing occurrences of \"datasourceStates\":{\"indexpattern\" with \"datasourceStates\":{\"formBased\" corrects the dashboard with embedded Lens visualizations when both viewing and attempting to edit an embedded Lens SO. However, it seems the PR to mitigate the issue only accounts for rendering of those visualizations. Attempting to edit them results in the error I've shown.

JAndritsch commented 1 year ago

To add to the above...

It seems after applying the manual fix to the embedded Lens SOs, several of my Metric visualizations stopped working and instead showed up blank. If I edit them and change their type from Metric (lnsMetric) to Legacy Metric (lnsLegacyMetric) then they work.

Also seems like any visualization of type lnsPie isn't working:

image

Are these examples of other migrations that did not run?

JAndritsch commented 1 year ago

@stratoula Not sure if this helps, but I found that the following seems to address all migration-related issues for me:

  1. Export all dashboards with embedded SOs before upgrading from 8.2 to 8.8
  2. Upgrade to 8.8
  3. Using the Saved Objects > Import feature to import and overwrite the exported SOs from step 1

It seems like the migration process that happens automatically on Kibana startup does not work, but the migration process SOs go through via Import work fine.

Note that this workaround is only effective if you've managed to export the SOs before they've gone through any migration process. Trying to re-import something that Kibana already migrated does not work, as the version numbers on the SOs are bumped even though the migration changed nothing else.

stratoula commented 12 months ago

@JAndritsch thanx a lot, yes we know that. Some times the internal kibana migrations do not work for some dashboards but if you import them from the SO management page the migrations will run. Thanx for looking into it

JAndritsch commented 12 months ago

@stratoula Thanks for confirming.

I also found that I was able to take my badly-migrated dashboards (where only version numbers were updated) and run them through SO import after making the following changes:

Doing this effectively reverted the dashboards back to their pre-migrated state and allowed me to run them through a SO import to correctly migrate them forward to 8.8.2. This is a process that I was able to do even if I could not pre-export the unmigrated SOs before upgrade.

ThomThomson commented 11 months ago

@stratoula have we seen any other instances of this recently?

stratoula commented 10 months ago

@ThomThomson we didn't have any updates on this but this doesn't mean anything. With this PR https://github.com/elastic/kibana/pull/160129 we make Lens visualizations work even if the disturbing migration fails so users will never understand.

Unfortunately all the other migrations are not causing failures to the charts so I am not sure how easy it is for the users to undersatand that something hasn't run in order to report it.

stratoula commented 9 months ago

Adding the label blocked as we still haven't managed to reproduce it but the bug still persists. I am keeping it open for visibility and for gathering more cases.

nreese commented 5 months ago

The new embeddable system moves migrations from server to client. This has 2 benefits 1) a failed migration will not block kibana from upgrading 2) Errors will no longer be silently swallowed. Instead, they will be diplayed for the individual panel that failed to migrate.