elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.77k stars 8.17k forks source link

Reduce performance impact of deleting legacy url aliases #141136

Open TinaHeiligers opened 2 years ago

TinaHeiligers commented 2 years ago

With the addition of a bulk delete API, we run the risk of performance degradation when objects being deleted have many legacy URL aliases.

The API that handles legacy URL alias cleanup after a write operation uses an updateByQuery with a simple script to delete one alias at a time.

For the API to handle many objects (a pseudo-bulk operation), the script would need to be adapted to handle different namespaces and delete behavior for each saved object that no longer exists.

At the moment, there are two bulk operations that include cleaning up legacy URL aliases:

To avoid potentially serious performance-related issues, the saved objects repository needs a performant API to delete legacy URL aliases in bulk.

_Originally posted by @TinaHeiligers in https://github.com/elastic/kibana/pull/139680#discussion_r974712311_

elasticmachine commented 2 years ago

Pinging @elastic/kibana-core (Team:Core)

elasticmachine commented 2 years ago

Pinging @elastic/kibana-security (Team:Security)

TinaHeiligers commented 2 years ago

We should also consider making MAX_CONCURRENT_ALIAS_DELETIONS configurable at either the global level or per API that implements the limit.

rudolf commented 1 year ago

Because the majority of saved object types are of namespaceType: 'multiple-isolated' this performance bug could potentially impact most saved object types that existed before 8.0.

Some use cases that might attempt to do large bulk deletes where legacy url aliases might cause performance problems:

LeeDr commented 1 year ago

I had an idea I wanted to suggest for these saved objects. They don't contain very many fields (see example below). What if instead of creating a saved object for every one of them we had a single saved object containing a list of all the sourceId, targetId, etc from each.

This would reduce the number of SO docs Kibana has to read and write during migration.

There's obviously some risk to this idea. It's putting more eggs in one basket. If this one doc were accidentally deleted or corrupted it would be bad. Might even be necessary to keep a few documents as old versions as updates are made. It could be one SO which tells Kibana the ID of the list. When a new revision of the list is written successfully that pointer doc gets updated with the new ID.

But how large might this one doc get? And how much memory in Kibana would it potentially consume while cached? Maybe the new file service could chunk it if it was too large.

Example doc;

{
  "_index": ".kibana_8.6.0_001",
  "_id": "legacy-url-alias:automation:index-pattern:ff959d40-b880-11e8-a6d9-e546fe2bba5f",
  "_version": 2,
  "_score": 0,
  "_source": {
    "legacy-url-alias": {
      "sourceId": "ff959d40-b880-11e8-a6d9-e546fe2bba5f",
      "targetNamespace": "automation",
      "targetType": "index-pattern",
      "targetId": "261c3a4b-d344-572c-bd88-7f484776534d",
      "purpose": "savedObjectConversion"
    },
    "type": "legacy-url-alias",
    "references": [],
    "migrationVersion": {
      "legacy-url-alias": "8.2.0"
    },
    "coreMigrationVersion": "8.6.0"
  },
  "fields": {
    "migrationVersion.legacy-url-alias.keyword": [
      "8.2.0"
    ],
    "legacy-url-alias.purpose": [
      "savedObjectConversion"
    ],
    "legacy-url-alias.targetId": [
      "261c3a4b-d344-572c-bd88-7f484776534d"
    ],
    "coreMigrationVersion": [
      "8.6.0"
    ],
    "type": [
      "legacy-url-alias"
    ],
    "migrationVersion.legacy-url-alias": [
      "8.2.0"
    ],
    "legacy-url-alias.targetNamespace": [
      "automation"
    ],
    "legacy-url-alias.targetType": [
      "index-pattern"
    ],
    "legacy-url-alias.sourceId": [
      "ff959d40-b880-11e8-a6d9-e546fe2bba5f"
    ]
  }
}
TinaHeiligers commented 1 year ago

@elastic/kibana-security Would the suggestion in https://github.com/elastic/kibana/issues/141136#issuecomment-1283129052 be feasible?

TinaHeiligers commented 1 year ago

@rudolf Considering we've seen real instances of many legacy_url_aliases making the problem of too many saved objects even worse and causing upgrade migration timeouts, we may need to reconsider the priority on this issue. Had bulk_delete been used to clean up the issues we saw with a lot of 'stale' case-user-actions and case-comments saved objects, we probably would've seen a serious performance impact.

legrego commented 1 year ago

Some of the history escapes me at the moment, but one of the reasons we create an alias-document-per-object is to prevent multiple aliases from referencing the same object. Document IDs are the only uniqueness guarantee that we have in ES today, and so leveraging this ID for uniqueness prevents another class of unresolvable conflict errors. There's a note in the original implementation which says:

Aliases will be stored as separate saved objects. Benefits of this approach:

  • ...
  • Alias raw ID ensures that there will never be multiple aliases with the same target namespace+type+ID
  • ...

Based on what I've been able to reconstruct, I don't think it would be infeasible, but a change like this would require careful planning, design, and consideration before moving forward with implementation.

rudolf commented 1 year ago

Agree with @legrego that this would be a risky change to make.

If we put all the aliases in one document we could enforce the uniqueness constraint in Kibana itself. But I would be concerned about the potential size of this document. We could probably easily test this by seeing how big the JSON of a object with 2m legacy url aliases inside an array would get.

Another option could be disable legacy url aliases for some SO types. I don't know the details, but I could imagine that we maybe have URL's to a case but not to a case-user-action. This would not solve the problem completely but it can reduce the impact somewhat.

jeramysoucy commented 1 year ago

Yes to @legrego and @rudolf - Definitely something to approach carefully and spend some time to test design feasibility.

One other idea to mitigate having one large alias doc would be to create one alias doc per type and/or per space. We could also create a secondary lookup system if this still proves to be too large in some deployments (have a max doc size, multiple alias docs, and a lookup doc), but I think that may be more complicated than justified by the benefits. Not sure what our most dense deployments look like IRL.

cnasikas commented 1 year ago

Another option could be disable legacy url aliases for some SO types. I don't know the details, but I could imagine that we maybe have URL's to a case but not to a case-user-action. This would not solve the problem completely but it can reduce the impact somewhat.

Unfortunately, we have URLs that point to user actions and comments. Each comment and user action has a "Copy reference link" button (see screenshot) with which the user can get a URL pointing to this particular user action or comment (the UI will scroll to this position). The format of the URL is <kibana>/cases/<case_id>/<case_comment_id> or <user_action_id>

Screenshot 2022-11-01 at 11 01 39 AM