elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.74k stars 8.14k forks source link

[Search service] Search Collectors and Background Search Collections #53335

Closed stacey-gammon closed 4 years ago

stacey-gammon commented 4 years ago

Background

We want to improve our support for long running queries and users that don't want to sit around and wait for a dashboard to finish loading when it could take hours/days.

Goals

Support:

Additional features, not as high priority:

Proposal

Implementation details

There will be two varieties of search collectors: Basic and Advanced. The Advanced search collector will expose some methods like:

class AdvancedSearchCollector {
  /**
   * The cache of searches will need to be cleared at certain times. For instance, once
   * a collection has been sent to the background, we need to ensure those searches aren't
   * cancelled by a user navigating away, so we should clear the cache to create new search
   * ids. We will also want to clear the cache when we want to send out new searches and not
   * re-use any existing ones, for instance the user is viewing "last 15 minutes" time range and
   * hits "refresh". We want new searches to be sent out in order to get all fresh data. 
   */
  clearCache() {...}

 /**
  * Or abort.  We need to expose a way for all the searches in this collection to be cancelled.
  */
  destroy() {..}

  /**
   * The search function that can be passed around, this function wraps the original
   * search function from the search service and will collect any searches going through
   * it.  If a request exactly matches a request made earlier it will re-use that result rather than
   * make a new query.
   */
  search: ISearchGeneric = (...) => {...}
}

The search collector can also handle showing/closing a toast the exposes cancellation, over-all progress, and the ability to send to background, though we may want to de-couple that functionality and put it elsewhere (like part of the dashboard embeddable).

Regardless, where ever we expose the "send to background" functionality, the user of the service will have to specify the url and the state that should be saved with the search collection as part of the BackgroundSearchCollection saved object. For the embeddable specific implementation, dashboards will not use the dashboard app URL, but a special URL that will show all background searches coming from Embeddables.

Dashboard app URL vs generic Embeddable viewer URL

While the generic viewer URL might be a simpler implementation in the short run, we've decided to try to make this work with application URLs, because we decided that makes for a better, more consistent UX experience.

In order to support the dashboard app URL:

Cons of doing embeddable viewer URL:

Pros of Embeddable viewer URL:

Decision: We are going to move forward with app specific URLs, not implementing the embeddable viewer URL if we can manage it without too much extra work, as it was decided this will create a better and more consistent UX. We can always revisit the idea of adding an embeddable viewer URL if we encounter too many issues.

Nested search collectors

If we expose the ability to create a child collector from a parent, we can have all of this functionality at the dashboard level as well as at the per panel level. This means we can use the same mechanism to cancel/show progress/send to background for each panel in a dashboard.

Exposing to the interpreter

We need to expose this search function to expression fns anyway so all the expression fns can use our new Search Api services, so the expression would be something like:

     fn: (context, args, handlers) => {
      handlers.search({ sql: args.query }, { handlers.signal }, SQL_SEARCH_STRATEGY);
    },

Search interceptor logic

Would be something like:

  search(request, options, type) {
    const hash = getHashIdForRequest(request);
    // Check to see if a request that matches this exactly has already been sent out.
      if (this.searches[hash]) {
        return this.searches[hash].search$;
      } else {
      // If not, create a new request and save it.
        const search$ = this.search(request, options, type);
        this.searches[hash] = {
          type: type || DEFAULT_SEARCH_STRATEGY,
          search$,
        };
        this.subscriptions.push(
          search$.subscribe(response => {
            if (response.id && response.progress < 100) {
              this.searches[hash].requestId = response.id;
            }
           this.calculateProgress();
          })
        );
        return search$;

Background search collection saved objects

Each background search collection object will have:

interface BackgroundSearchCollection {
  searches: Array<{
    requestHash: string;
    searchId: string;
    searchStrategyType: string;
  }>;

  url: string;

  // Storing the state as an object instead of in the URL will make migrations easier,
 // but I'm not sure of the logic for restoring this state to the URL.  Can the view links for
 // each background search collection be
 // generated from this plus the state-free url using new state management utilities?
  state: Object;

  // I think we will need an app id so each app can handle migrating it's own URLs that
  // are stored in here.
  appId: string;
}

Graceful fallback

If an embeddable doesn't use the search service, those searches will just be executed normally when the background search URL is opened. This is actually somewhat problematic because the searches will be out of sync if we have relative time ranges. The cached searches "now to now - 1y" may end up showing up "now - 4days to now - 1y and 4 days" but embeddables that aren't using our search collectors will send live queries.

I suspect this is something we can hack around, or just live with. Also, if we switch to absolute time ranges, which will probably be better for the user anyway, and more accurate, this shouldn't really be a problem. The panels will be showing the same information.

Corner cases and considerations

When to clear the search cache?

Consider the scenario:

Similar:

Easiest technical solution: Dashboard embeddable can detect any update to it's input and clear the cache, however, this may prove to be slightly difficult because of async updates to dashboard input. For example, when the app first loads there are no filters, then filters are later added async by the filter manager. If this is all part of the page loading process, we want the searches to stay the background versions. But if the user interacts with the page and does something, we want them to be new searches. We can possibly hack around this in the dashboard app, but every solution that wants to use this system needs to consider this

What happens after you "send to background"?

Ideally:

Screen Shot 2020-01-10 at 12 12 12 PM

Is your current view using the search ids that are saved in the background search collection saved object you just created, or are they using different ids?

If the current view is editable, and using the same search ids, we have to be careful not to accidentally abort them, for instance if the user deletes a panel. What happens if a user adds a new panel and then wants to "send to background" again - you don't want two search collections using that have the same search ids

What if you were viewing a relative time zone, will it switch you to absolute?

What happens if you hard refresh the page at this point?

Two panels that share the same exact request, one gets deleted

For instance, if we have two panels that share the same request, and one panel gets deleted from the dashboard, this will cancel the other panel's search. We may want to have a count on each search so we don't do this as long as there is >1 user of a search.

Easiest technical solution: Accept this as a known issue. I don't expect it would be a huge issue in practice. Refreshing the page should cause new searches to go out so there is a workaround.

Serialized searches coming through the collector

Any expression that contains two search functions will hit this (though it's not many). For instance, a user clicks send to background while the first search is running. The second search will not have a saved background id for it and it will be re-executed every time the background search collection is loaded back up.

This happens today in:

Searches that have already completed

We need to also test the situation where some searches have already completed.

Expose filters/time range from the dashboard

Since the filter bar and time picker are not yet part of the dashboard embeddable, we'll need to make sure those are shown to the user in the embeddable viewer somehow, or bring that bar inside.

User leaves page before a search returns the first response with the id

We can't avoid showing the "send to background" button until every search has an id because searches can come in at any time, we don't want this button flickering. How do we handle this situation? Similar to the situation with an expression with serialized searches.

Relative vs absolute time ranges

We should convert relative to absolute time ranges or it will be confusing for the user to see "last 5 years" or something, but it's not actually "now", now is some time in the past.

This is something solution teams will have to manually handle this.

Expiration of searches

The current plan is that searches will expire after some amount of time after they complete. The problem with how that will work with this search collector is that we theoretically could have one search in the collector that expires before another search ever even completes (albiet this is probably unlikely, theoretically possible).

Another thing we want to answer is how to determine if a search is expired and communicate that to the user. Do we auto delete these, or make the user delete the objects when they are expired?

Should the Embeddable that is backed off of a background search collection be editable?

Ideally yes.

Embeddables by reference vs by value

Situation is:

  1. User creates a new dashboard. Adds two panels.
  2. Users saves the dashboard, URL is now just the dashboard id.
  3. User clicks "run in background". URL stored with background search collection is just the dashboard id plus bgSearchId=xyz.
  4. User edits a visualization in that dashboard.
  5. What happens if the user goes back to "view" the background search?
    • The unchanged visualization will match a cached search. The other panel with have a cache miss and a new search will go out.

Similar situation, maybe more awkward:

  1. User creates a new dashboard. Adds two panels.
  2. Users saves the dashboard, URL is now just the dashboard id.
  3. User clicks "run in background". URL stored with background search collection is just the dashboard id plus bgSearchId=xyz.
  4. User edits the dashboard and deletes a panel.
  5. What happens if the user goes back to "view" the background search?
    • The user will see the newer saved dashboard that only has one panel, not two. It no longer looks like the dashboard that was actually "sent to background".

Note that the second situation will not happen with the embeddable viewer version because the embeddable input is saved as a snapshot with the URL state, BUT, the first situation could still be encountered because children embeddable input would continue to be a reference to the saved visualization, not a visualization "by value".

We might be able to partially solve this if we put the dashboard input back in the URL so it overrides any changes made on the saved dashboard.

As for the first scenario, there is not much we can do without a major overhaul (although some of this work is already underway here: https://github.com/elastic/kibana/issues/51318

URL state Migrations

We need to be careful that teams that register their own URLs understand they will have to migrate the state in the saved objects if any of the URLs + state will no longer work. It's unclear if saved object migrations even support multiple plugins adding migrations for the same saved object type, but one workaround, while not the ideal scenario, could be to have teams write the migration in the background search plugin.

Easiest technical solution: Move forward knowing this is a risk for bugs, communicate with teams they should add bwc tests for URLs.

Best technical solution: Probably something like https://github.com/elastic/kibana/issues/25247 - store URLs as a common saved object type and link to that. Only one saved object to migrate, any team that uses the URL service knows they will have to migrate their saved objects.

Roadmap

elasticmachine commented 4 years ago

Pinging @elastic/kibana-app-arch (Team:AppArch)

lukasolson commented 4 years ago

@ppisljar and I discussed another possible approach that relies more on the Elasticsearch cache and (possibly in a second phase) building services to execute the interpreter in a background job.

When a user navigates to a dashboard, we would execute everything as normal. If a user then selects to run in the background, we would gather the state for the entire dashboard (including each embeddable). I understand this is currently something that is not available since embeddables are currently passed their saved object ID instead of the actual inputs, so this is something we'd have to build. It would probably look something like this: Container embeddable gathers its own state, then calls getInput on each embeddable recursively. The input would have to be serializable, because we would store it, in addition to a URL, in a saved object. (Search IDs would not be stored in the saved object.)

When re-visiting the URL, the dashboard would pass down each embeddable's input which would need to contain all of the state information the embeddable needs to know to execute and render, instead of the saved object ID (or async search IDs). Each embeddable would then execute and render as usual . . . however, the requests to Elasticsearch would then execute much faster (because of the Elasticsearch cache).

We would still need to pass some sort of cache: true | false parameter in the interpreter execution, because if a user clicks "refresh", we assume they would actually want fresh data from Elasticsearch. The interpreter function could check this flag and forward it on as the request_cache parameter to gather fresh data.

This may work for a first phase approach, but you still run into the edge case for expressions that contain multiple requests to Elasticsearch, with the latter based on the previous. If we want to support these types of functions, we would need to build a service that truly executes the entire expression in the background. The output of such would not be stored anywhere, it would just need to ensure that it continues execution so that expressions with multiple requests to ES wouldn't stop with the first request.

One downside to this approach is that it relies on the Elasticsearch caching, which (in the current plan) would be cleared out each 4 days. We have no idea of knowing when the cached results "expire" (or are cleared). Also, what if one request executes quickly, and another request takes 4 days to complete? This is an edge case we need to consider.

What are your thoughts about the feasibility of this approach @stacey-gammon?

stacey-gammon commented 4 years ago

If I understand this correctly:

The only thing Kibana has to do is decide when to send request_cache and when to not actively abort a search.

It sounds a lot simpler from the Kibana side, but it also seems like we would have a lot less control. I have a feeling this isn't going to work out well.

For example, how would we be able to calculate progress and ability to cancel searches in this view:

Screen Shot 2020-01-13 at 5 01 10 PM

Storing all state by value instead of by reference

Agree this is the ideal long term solution. I suspect this will be a big feature in and of itself but @ppisljar thinks it might end up being easier to accomplish. If so, awesome.

Search collectors

We still need search collectors in order to get progress and cancellation.

So I think the only question here is, do we add logic in Kibana to hash the request and send the search id to es instead of the full request.

Pros of not doing this mapping:

Possible shorter term wins

Synced with @ppisljar today to go over this and above comment. We think the best plan forward is to start with the search collectors, which we'll need anyway. But once we have those incorporated it would be worthwhile to see if we can get away with an easy win by exposing:

Then, we don't need any new saved object types, no management screen. How to get fast searches? Click "run past timeout" and check "Use cached searches". If cached searches are available in ES, they will be used automatically. This is way easier to implement than the background saved object collections. Nice idea @ppisljar!

ppisljar commented 4 years ago

search service is not passed to the interpreter function, rather interpreter function is given reference to the search service at the time of registration to the registry. this doesn't affect much, instead search function being one of the handlers, we would pass in an instance of searchCollector (similar to what we do with inspectorAdapters)

stacey-gammon commented 4 years ago

search service is not passed to the interpreter function, rather interpreter function is given reference to the search service at the time of registration to the registry.

If we did that though, we couldn't scope it and it would be a global singleton search collector. I think we need to allow the embeddables to pass it in via extraHandlers. We could perhaps do both, so if the embeddable opted not to pass in a specific instance of a search collector, functions could use the global one, though it might be confusing then which one functions should use. Maybe better would be that the interpreter decided which search to pass down as part of handlers - a global one, or one passed in via extraHandlers.

ppisljar commented 4 years ago

@stacey-gammon thats not the case (the part about not being able to scope it), we would be passing the instance of search collector to interpreter, which means we can pass a different instance in each time we execute an expression (if required), so the app is completely in control.

So at the time of registration, we give esaggs a reference to the search service, but at the time of execution we pass an instance of searchCollector in. (how we create the search collecton can stay out of this discussion as its irrelevant)

I would avoid using global search collectors for the reasons you described to me earlier. If somebody would really want to do it, he would need to make sure he passes the same instance of search collector to everything making the requests.

stacey-gammon commented 4 years ago

Okay, gotcha, I think we are actually on the same exact page then, I just read your comment incorrectly!

lukasolson commented 4 years ago

Closing in favor of https://github.com/elastic/kibana/issues/61738.