elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.63k stars 8.22k forks source link

Discuss: How to migrate a saved object that contains state from another plugin #56166

Closed stacey-gammon closed 3 years ago

stacey-gammon commented 4 years ago

Our current migration system assumes that the developer who created and registered a new saved object type is in charge of migrating any outdated state in that saved object. We haven't thought about the situation where a saved object contains state that was built by other, potentially unknown, developers. With many of our new features however, this will become an issue.

Consider:

URLs

The migration issue with URLs is heavily discussed in that above link, but I'll recap the situation here:

Embeddables & actions

I recently realized that once we have embeddables that are "by value" instead of by reference, we will hit the same situation. The developer of dashboard saved objects is in charge of migrating them, but this developer doesn't know how to migrate the panels inside it, because those are extensible and owned by other developers.

There is actually a third layer here as well... actions. Actions can be attached to certain embeddables and will also be stored together with the embeddable input inside the dashboard. How will the author of an action update the state referenced in embeddables, inside dashboards.

Canvas workpads as well - we want to be able to update embeddables, but how do we migrate state from inside canvas workpads.

For URLs, the thought was to have each implementation expose a migrate method that the author of the saved objects can call. For Embeddables, this is a tougher problem and there are more considerations:

Expressions

The expression string is stored in saved objects (Lens, Canvas workpads), but each expression function is extensible. How can a single developer of an expression function update the args or output shape, or expected input context without potentially breaking all those saved objects which reference it?

Even if we came up with a migration plan, we can still hit the issue where if the output or input shape changed, the previous, or following expression function may fail because it didn't update.

Expression system has a "cast types" system which we might be able to use.

Search strategies

What happens if we want to migrate the state a search strategy accepts? What if we need to migrate an expression function to account for that? For instance, hypothetical, SQL search strategy no longer supports some given syntax but this syntax is saved in an essql expression string? We have to potentially migrate canvas workpads, lens objects, dashboards (with in place embeddables). We also don't know for sure where else a plugin developer may be storing outdated state.

Any extension point / registry item

Essentially this boils down to having a generic migration system for anything stored in a registry. We need a way for third party developers to add migrations for state, whether that state is stored as a saved object, in a saved object, or just dynamically in code (consider the case of an embeddable hard coded into an application - how do we warn the user of the deprecation period before we end support completely and they need to migrate by).

Blank Diagram (5)

Also keep in mind, this can go the other way. Any registry item may want to migrate its state structure, but this can have cascading affects, both up the chain and down. What if a childEmbeddable wants to change it' explicitInput from { name } to { firstName; lastName }. Because childEmbeddable state is stored up the chain (in a container, and that container's state stored somewhere, like a dashboard saved object, or hard coded in the SIEM overview page), those objects need to update the state. On the other hand migrations might float downward, for instance if dashboard embeddable wants to change its input state to accept { filters } instead of { timeRange, filters, query }. All the children embeddable need to update to accept the new version or they will eventually be broken.

Blank Diagram - Copy of Page 1

Another example - think about how much state needs to be migrated if we want to change our Filters shape. Many potential saved objects and also a lot of code. How do we keep all of these in sync so we can support older saved objects that contain references to older versions?

Blank Diagram - filtersstate

I don't have a good solution to offer yet, though I think this might end up being an extremely generic state migration system that Saved Object migrations could be built on top of. It would be on the fly - only the saved object migration system would touch the Kibana database.

Blank Diagram - statemgirations

How important is this to tackle now vs later?

Well, we lived without saved object migrations for many years. We also already have lived with this problem with the expression language for awhile. It is going to be a problem when we want to have a stable API for third party developers, but in the short term we can work around this by keeping all teams in sync (e.g. we want to change the shape of Filters, we know all our saved objects and run time places that we need to migrate: expressions, actions, embeddables, etc).

I propose we focus on this in 8.x and I think the platform team should be involved, even if app arch ends up owning this, because of the overlap with saved object migrations.

I think we can continue to move forward with a direct access link service without being blocked on this, and I think we could probably continue to move forward with in place embeddables on a dashboard, even though knowing we may need to make some drastic migrations in order to support a new more robust system. This is by no means a small project though so I doubt there will be any 7.x resources for it.

cc @joshdover @rudolf @timroes @majagrubic

timroes commented 4 years ago

cc @peterschretlen This refers to the issue we talked about earlier with migrations in embedded visualizations

elasticmachine commented 4 years ago

Pinging @elastic/kibana-platform (Team:Platform)

rudolf commented 4 years ago

In NP we want to limit migrations to the "owner" of the type. If the embeddables plugin registers the embeddable type we don't want another plugin (elastic or third party) to be able to migrate it's data. I think it's important to maintain this ownership aspect. So if embeddables wants to allow other plugins to migrate the data it stores on their behalf, it should explicitly allow this and might want to add certain restrictions or safe guards.

The way I see this implemented is for the embeddables plugin to expose a migrations API from it's setup contract (the types are mostly for illustration purposes):

export class EmbeddablesPlugin {
  constructor() {}
  public setup(core: CoreSetup) {
    return {
      registerEmbeddableFactory(
        factoryId: string,
        factory: any,
        migration: (input: Embeddable) => Embeddable
      ) {
        /* do other embeddable stuff */
        const wrappedMigration = safelyWrapMigration(factoryId, migration);
        // This API isn't implemented yet...
        core.savedObjects.registerMigration('embeddables', wrappedMigration);
      },
    };
  }
}

safelyWrapMigration is how the embeddables plugin protects it's data. I don't know exactly what it's implementation will look like, but it might check to see if a specific saved object belongs to the factoryId and if not, it becomes a no-op and just lets the data through unaltered. Something like:

function safelyWrapMigration(factoryId, migrationFn) {
  return (inputObject: Embeddable) => {
    if (inputObject.factoryId === factoryId) {
      return migrationFn(inputObject);
    } else {
      return inputObject; // This migration wasn't allowed to alter this data, ignore and leave data unaltered.
    }
  }
}
rudolf commented 4 years ago

The suggestion above only solves for the case where the state migration is applied to a single type. If as in the example of saved search strategies the state lives in multiple saved objects owned by different plugins, each of those plugins will be responsible for registering a migration. Like other breaking API changes we will have to coordinate such changes with teams and external plugin authors.

tylersmalley commented 4 years ago

Discussed this with @kobelb, as we do this with security currently. They manage the "security" type, but it's usually on an object owned by another type.

Here are my thoughts: Since a dashboard would have multiple other types, like visualizations, we could extend that support to allow for an array of objects of another type.

{
  "_id: "dashboard:1:ABC123",
  "type": "dashboard",
  "dashboard": {
    "panels": [
      {
        ...,
        "panelRefName":"panel_0"
      },
      {
        ...,
        "panelRefName":"panel_1"
      }
    ]
  },
  "references": [
      {
        name: "panel_0",
        type: "visualization",
        id: "37cc8650-b882-11e8-a6d9-e546fe2bba5f"
      }, {
        name: "panel_0",
        type: "visualization",
        id: "self:12345"
      }
  ],
  "visualizations: [
    {
      id: "12345"
    } 
  ],
  "security": { ... }
}

Here I am just piggybacking off of the references array, but I don't think that would be necessary as we could skip references and use something other than panelRefName. This would allow for visualization migration to be written and applied to documents which it doesn't explicitly own.

{
  "_id: "dashboard:1:ABC123",
  "type": "dashboard",
  "dashboard": {
    "panels": [
      {
        ...,
        "panelRefName":"panel_0"
      },
      {
        ...,
        "incVisualizationName":"panel_1"
      }
    ]
  },
  "references": [
      {
        name: "panel_0",
        type: "visualization",
        id: "37cc8650-b882-11e8-a6d9-e546fe2bba5f"
      }
  ],
  "visualizations: [
    {
      name: "panel_1",
      id: "12345"
    } 
  ],
  "security": { ... }
}

Thoughts?

joshdover commented 4 years ago

I like this references approach. My initial reaction to this issue is this seems more like a data schema design problem rather than an issue with the migration mechanism.

I think if a plugin needs to have any persisted state, it must be responsible for that state OR the plugin it depends on needs to provide a suite of tools for the state it stores on other plugins' behalf. However, I think even that solution has problems.

The overarching design I have for plugin integrations is that all integration points between plugins should be via plugin contracts (setup and start), always in JavaScript. What I mean by that, is that plugins should not have any implicit APIs (eg. HTTP routes, shape of SavedObject state, global functions on window, etc.). Implicit integration points lead to a whole host of hard-to-detect problems: breakages when a plugin is disabled, breakages when a saved object mapping changes, breakages when an HTTP route is renamed, etc.

However, when we constrain all integrations points to be only via setup and start contracts, we get an opportunity to detect, prevent, and handle breaking changes. Examples:

If we were to design Embeddables' state loading using plugin APIs, we could allow plugins to manage their own state and decide whether or not they need/want to write migrations. What this might mean in practice for embeddable factories, is that they would register:

  1. a function for rendering
  2. a function to retrieve and return their state
  3. a function to persist their state

When a dashboard is loaded, it would look at its list of panels, and instruct the embeddables plugin to load the state for each panel. The embeddables plugin would invoke the registered state retrieval function (2) for each embeddable type. When the dashboard is saved, the embeddables plugin would invoke the registered state persistence function (3).

In the case of the URL service, this could mean that the URL service persists a reference to an object, which the consuming plugin would register a function for loading that object and generating a URL for it. That object could be anything (raw string, bag of options, array, etc.) and is totally up to the plugin that is being linked to.

stacey-gammon commented 4 years ago

@tylersmalley - using your approach how would you handle state from actions stored in embeddables, inside dashboards? Who is applying the migration in your references approach? Is is the dashboard applying a migration written by the embeddable authors or are embeddable authors registering migrations on a dashboard? If the latter, then the author of a plugin needs to know every saved object it might appear in and I don't think we can make that assumption.

I think what @joshdover suggests is what I am thinking would work - registry items which might end up in persisted storage would have to expose methods for serializing/deserializing state, and those methods should handle legacy state conversions. I just don't see how this ties into the reference approach.

When a dashboard is loaded, it would look at its list of panels, and instruct the embeddables plugin to load the state for each panel.

This is what happens now, via factory.create(state, parent). There is no formal versioning so every factory would have to do some logic checks to guess which version the state is from, or we store a version on it. And the persisted state is retrieved from embeddable.getInput().

I'm not sure I follow the URL comment: "the URL service persists a reference to an object". The URL service registers a URL saved object which the background search collection would point to?

The difference between Josh's approach and my thinking is that it's not obvious to a plugin author that they have to handle legacy state migrations so formalizing this system might help, since this will be an issue when any registry item ends up serialized in some other saved object.

I'll look to throw some time on the calendar because I'm curious to walk through your thoughts a little more.

joshdover commented 4 years ago

I think my explanation could benefit a lot from a concrete example of what I mean. I'll work on that before we meet.

That said, I think the biggest distinction between what I'm proposing and what you are proposing is that I don't think the data should be persisted in the Dashboard at all. I think only an ID referencing a SO should be saved in the Dashboard and the plugin that owns the Embeddable type should handle fetching and persisting that object.

With that separation, the plugin that owns the Embeddable type can then also handle migrating its state independently from the Embeddables plugin. If the Embeddables API needs to change, it should only be changing in JavaScript, which the plugin that owns the Embeddable type can then choose how to handle that API change.

stacey-gammon commented 4 years ago

I think only an ID referencing a SO should be saved in the Dashboard and the plugin that owns the Embeddable type

We talked about this briefly in the Area lead sync meeting the other day... continuing with the reference model has its own set of challenges which is why we were starting to go down the path of the nested "by value" model for dashboards. Not to say that we can't present an argument about why we should stop and change our minds, but we have to think about why we wanted to go with the nested/by value approach anyway.

One was permission/security related - a user has access to a dashboard and everything on it - no conflict with permissions from other saved object types.

The other reason is management and "library" items. A user adds an embeddable to a dashboard, this creates a saved object, but then the user never saves that dashboard - how is that embeddable deleted and cleaned up? If we tried not to create the referenced saved objects until the dashboard itself was saved, even then, what happens if the dashboard is deleted but not the references? What happens if the dashboard gets created but an error occurs with the referenced items? ES doesn't have rollback/transactional support. Maybe the plan would be to have the SO reference model handle all of that?

kobelb commented 4 years ago

One was permission/security related - a user has access to a dashboard and everything on it - no conflict with permissions from other saved object types

To briefly elaborate upon this, references mean something "special" in the context of security especially with the introduction of sharing saved-objects in multiple spaces. The use of references currently also has an impact on end-users with regard to security. If a user only has access to Dashboards, they're unable to create any new Visualizations/Maps, etc. which end up being embedded in the Dashboards. While it's possible for us to continue to use separate Elasticsearch documents to model an "embedded" saved-object, it'll require that we differentiate between a "reference" and an "embedded reference".

lukeelmers commented 4 years ago

I think only an ID referencing a SO should be saved in the Dashboard and the plugin that owns the Embeddable type should handle fetching and persisting that object.

To me this seems like the cleanest option we have discussed so far in terms of making the ownership of portions of state explicit, and making it possible to run state migrations without the containers needing to track the contents of children, for example. If there's a single source of truth for each of the various types of state that need to be persisted, a lot of the other complexities go away (but admittedly, not all of them)

continuing with the reference model has its own set of challenges which is why we were starting to go down the path of the nested "by value" model for dashboards.

I'll be interested in discussing these more when we look at Josh's example. I think some of these challenges are simply trade-offs... a choice between:

  1. the complexity of tracking references (& permissions) between various Saved Objects, or
  2. the complexity of having duplicated state that you cannot track & update in a centralized way

So far, I have a hard time thinking of a third scenario where we aren't dealing with at least one of these two problems, but I would be happy to be proven wrong 😄

stacey-gammon commented 4 years ago

the complexity of having duplicated state that you cannot track & update in a centralized way

I think I would say this is the complexity of having persisted state that you cannot track & update in a centralized way or maybe instead of persisted, "legacy".

Couple more comments:

stacey-gammon commented 4 years ago

I think going with an embedded reference model would also leave us in a situation with a lot of saved object types. For embeddables, this might feel more natural because the embeddables we have are all backed by saved objects, but what about expression fns? Right now they are stored in saved objects as a string, e.g.: kibanaContext | esaggs indexPattern="1234313"

I think what you are proposing is to no longer store just this string in a saved object but a reference to a saved object that represents each function. This would mean a new saved object type for every expression function type. That is a lot of new saved object types. This would also mean pretty awkward storage and parsing to/from expression reference model to expression string.

I just want to make it obvious that the problem is not just with embeddables, but any extensible registry where state used to re-create an instance of the registry item ends up stored in a saved object:

This hasn't been a problem so far because:

This will become more of a problem as:

It becomes a communication challenge. The maps team needs to know that state to create their embeddable is stored in an expression function in canvas workpads. They want to change interfaces in a major version, they need to be aware of this persisted state and they need to either migrate it, or to continue to support legacy state. How is this tested and who handles the state migration (should canvas or maps team own the maps embeddable expression function?).

Going with a saved object embedded reference model, I suspect, would only get us so far, and introduce a lot of complexity (e.g. a new saved object type for every registry implementation that might end up in a saved object).

stacey-gammon commented 4 years ago

Posting summary of today's meeting, from my perspective.

This boils down to two main problems:

  1. Migrating state "on the fly" - at runtime, in memory - in a formalized fashion (we don't want to go back to pre-saved object migration state where migrations were scattered all over the place, ad hoc).
    1. Migrating persisted state of "nested saved objects". These are objects, that are saved, but not necessarily stored in a separate document. There are ways we could solve this using nested references but it's a difficult problem because it means the owner of a saved object must have knowledge of all the contents it is storing. I don't believe we can make this assumption because of the pluggability of our system. For example:
      • An embeddable, that is part of a dashboard, might serialize and store state from an action instance. The dashboard knows it is storing embeddable state, but it doesn't know that internally, that embeddable state has yet state from another object, maintained/owned by a third developer.
      • Lens knows it's storing an expression string, and knows each expression string is made up of an extensible expression function. What if I wrote an expression function applyFilter that had argument data that was used to instantiate another pluggable object, like filterType='geoFilter' filterConfig="lat:123;long:432", and internally the function did filterRegistry.getFactory(args.filterType).create(args.filterConfig). This is now another multi-level nested state that the owner of Lens object does not know about because it doesn't know about specific implementations of the expression functions it houses.

We decided that 1. is more important to solve in the short term, while 2. can wait, potentially indefinitely.

We talked about how it's important to formalize something so that every developer adding a specific implementation to a registry item that might be serialized and persisted knows they should handle legacy state migrations. Something like a PersistedRegistryFactoryProvider was thrown out there.

We also mentioned how important it is to keep type history around for legacy types, as long as they are supported, or deprecated. For this reason, I think it might make sense for this system to be something like:

  PersistedRegistryFactory {
    migrate(args: { state: object; id: string }): { newState: object; newId: string });
    isDeprecated(id: string): boolean;

We could use the id as the versioning information, which I think would make typescript easier. Something like:

  PersistedStateMappings {
    [id: string]: Shape;
  }

^^ That way every time you have an id you have the shape that it maps to. I suspect this might be easier to incorporate into our registry system as well since every registry already uses an id and state - no need to introduce a third version variable. If we went with this method, you would never change the public contract of registry item, it would be considered immutable. If you needed to make a change, you would deprecate the one, and add a new one, using migrations to switch from the old one to the new one, on the fly.

If that didn't work out well though, we could probably come up with a system that was also closer to:

  PersistedRegistryFactory {
    migrate(args: { version: string; state: object; id: string }): { newState: object; newVersion: string });
    isDeprecated(id: string): boolean;

You'd just then need to store the shapes per id, per version.

Regardless, we agreed it's important to keep type information around.

lukeelmers commented 4 years ago

Overall this aligns with my understanding of the discussion as well.

What if I wrote an expression function applyFilter that had argument data that was used to instantiate another pluggable object, like filterType='geoFilter' filterConfig="lat:123;long:432", and internally the function did filterRegistry.getFactory(args.filterType).create(args.filterConfig).

I don't want to tangent too much here, but technically for expressions this problem could still be solved with on-the-fly migrations using the interpreter's built-in types registry, which has the concept of type casting. You write a custom expression function, you say the types that it accepts for each arg, and each of the types can have built-in logic to handle casting to/from other types for you. Currently there's a plan to add a type converter registry so that built-in types can be extended by others, which could then be used for migrations such as these.

Still doesn't solve the problem of an expression that's persisted in a SO with legacy data; I'm only pointing out that we can probably still solve this particular use case at runtime as a first step.

stacey-gammon commented 4 years ago

Not to further derail, but I'm pretty sure the current cast types system in expressions could not handle this.

For one, the type is just a string, and the shape depends on the value of filterType. It's extensible so the author of the applyFilter fn doesn't know the type/shape of the filterConfig based on the filterType.

Even if the author could type something like:

args: {
   filterConfig:  [GeoFilterConfig, TermsFilterConfig, RangeFilterConfig]
}

It can't list every option because it's extensible. The author of a custom filter may know nothing about expressions, so it might not know it should add a type casting for it.

Though I do think something like the generic type casting system canvas wrote is what we are after. I just don't think the current version is good enough.

stacey-gammon commented 4 years ago

Walked @timroes through the summary of the meeting this am and want to clarify a few confusing points here with some pseudocode that I think such a system could work like:

We could provide a helper function for these migrations, which would recursively loop through them all to ensure your instance was of the latest version:

const getPersistedRegistryItem(id: PersistedRegistryId, state): PersistedRegistryStateMap[id] {
  const factory = persistableRegistry.get(id);
  if (factory.isDeprecated()) {
    const { state, id } = factory.migrate(state);
    return getPersistedRegistryItem(id, state);
  } else {
    return factory.create(state);
  }
}

Typescript types could do something similar to what search strategies does and what core does, with mappings of ID to shape:


interface PersistedRegistryStateMap {
  [LENS_EMBEDDABLE_V1]:  SavedVisObjectLegacy1; 
  [LENS_EMBEDDABLE_V2]:  SavedVisObjectLegacy2; 
  [LENS_EMBEDDABLE_V3]:  SavedVisObject; 
}

The latest shape can always stay the same, when you write a migration, you rename the old state. This will avoid us having to rename types everywhere and make it easier to read. Plugins can extend this map:

declare module '../../../src/plugins/persistableRegistry/public' {
  export interface PersistedRegistryStateMap {
    [MY_REGISTRY_ID]: MyRegistryShape;
  }
}
lukeelmers commented 4 years ago

Not to further derail, but I'm pretty sure the current cast types system in expressions could not handle this.

Finally got around to putting some code together to demonstrate what I was talking about -- I have a feeling I am not understanding your use case correctly, but hopefully this helps explain my thought process.

if PersistableFactory.create(state) is called and isDeprecated() is true, it will throw an error.

The idea for how we handle this in TS all makes sense to me, but I think I'm not 100% following what create(state) is doing... I assume the intent is that this is returning an instance of whatever the "thing" is that requires the app state? A few more concrete examples might help me grok this.

stacey-gammon commented 4 years ago

Okay, so I think in your example code, it would work. but it would be built on top of a base migration system. Like you pointed out in your POC - it requires the type to not be string, so migrating anything that already is string that should have been something more defined will be difficult. And I don't think that covers merging two arguments into one.

I have code in https://github.com/elastic/kibana/pull/57496 which shows how I would build a generic system, though I just built it into the URL service. I will try to write a follow up PR for how I would make it work generically.

rudolf commented 3 years ago

The last bit of work to enable this is described in #84907 and likely to be merged soon so I think we can close this discussion.