[Search service] Search Collectors and Background Search Collections

stacey-gammon commented 4 years ago

Background

We want to improve our support for long running queries and users that don't want to sit around and wait for a dashboard to finish loading when it could take hours/days.

Goals

Support:

Running dashboards in the background. User can close browser, come back later, check progress and view these dashboards running in the background.
Provide a service for solution teams to use if they would like to use background search capabilities outside of dashboards.

Additional features, not as high priority:

Dashboard and per-panel progress and cancellation support.

Proposal

Scoped Search Collectors collect and manage a group of searches.
Background search collections - saved objects that contain:
- Mapping of request to background search id
- Url to load up when the user wishes to view partial or final results of these searches.
- A state object which will be used to initialize state for the given URL (rather than save it as part of the URL itself. This will make migrations much easier).
- A pre-configured embeddable level solution

Implementation details

There will be two varieties of search collectors: Basic and Advanced. The Advanced search collector will expose some methods like:

class AdvancedSearchCollector {
  /**
   * The cache of searches will need to be cleared at certain times. For instance, once
   * a collection has been sent to the background, we need to ensure those searches aren't
   * cancelled by a user navigating away, so we should clear the cache to create new search
   * ids. We will also want to clear the cache when we want to send out new searches and not
   * re-use any existing ones, for instance the user is viewing "last 15 minutes" time range and
   * hits "refresh". We want new searches to be sent out in order to get all fresh data. 
   */
  clearCache() {...}

 /**
  * Or abort.  We need to expose a way for all the searches in this collection to be cancelled.
  */
  destroy() {..}

  /**
   * The search function that can be passed around, this function wraps the original
   * search function from the search service and will collect any searches going through
   * it.  If a request exactly matches a request made earlier it will re-use that result rather than
   * make a new query.
   */
  search: ISearchGeneric = (...) => {...}
}

The search collector can also handle showing/closing a toast the exposes cancellation, over-all progress, and the ability to send to background, though we may want to de-couple that functionality and put it elsewhere (like part of the dashboard embeddable).

Regardless, where ever we expose the "send to background" functionality, the user of the service will have to specify the url and the state that should be saved with the search collection as part of the BackgroundSearchCollection saved object. For the embeddable specific implementation, dashboards will not use the dashboard app URL, but a special URL that will show all background searches coming from Embeddables.

Dashboard app URL vs generic Embeddable viewer URL

While the generic viewer URL might be a simpler implementation in the short run, we've decided to try to make this work with application URLs, because we decided that makes for a better, more consistent UX experience.

In order to support the dashboard app URL:

Need to generalize dashboard app URL so the state can include commercial settings that get passed along to the dashboard embeddable.
Need to decide when to flip out of “background mode” (and only in commercial code):
- Dashboard embeddable can intercept inputUpdates and flip out that way but this is fragile as some updates we don’t want to cause to flip out of, for example, switching from edit to view mode, or resizing a panel. This is easier with a dedicated viewer because we can more easily restrict editing capabilities.

Cons of doing embeddable viewer URL:

It would be a breaking change if we wanted to change it later to the dashboard app.
Dashboard app would make for better editing UX flow, if you wanted to not just see the results of a background search but take action on them, like drill down further.
Might be an awkward UX for solutions that decide to implement their own URL. Why do some "view" links send to this generic embeddable url, while some send to some other URL. Why doesn't the embeddable viewer have all background search collections available to loop through? Probably not a big deal.

Pros of Embeddable viewer URL:

Can quickly flip through all saved background search collections that came from any embeddable.
More control of the page, no OSS/commercial split.

Decision: We are going to move forward with app specific URLs, not implementing the embeddable viewer URL if we can manage it without too much extra work, as it was decided this will create a better and more consistent UX. We can always revisit the idea of adding an embeddable viewer URL if we encounter too many issues.

Nested search collectors

If we expose the ability to create a child collector from a parent, we can have all of this functionality at the dashboard level as well as at the per panel level. This means we can use the same mechanism to cancel/show progress/send to background for each panel in a dashboard.

Exposing to the interpreter

We need to expose this search function to expression fns anyway so all the expression fns can use our new Search Api services, so the expression would be something like:

     fn: (context, args, handlers) => {
      handlers.search({ sql: args.query }, { handlers.signal }, SQL_SEARCH_STRATEGY);
    },

Search interceptor logic

Would be something like:

  search(request, options, type) {
    const hash = getHashIdForRequest(request);
    // Check to see if a request that matches this exactly has already been sent out.
      if (this.searches[hash]) {
        return this.searches[hash].search$;
      } else {
      // If not, create a new request and save it.
        const search$ = this.search(request, options, type);
        this.searches[hash] = {
          type: type || DEFAULT_SEARCH_STRATEGY,
          search$,
        };
        this.subscriptions.push(
          search$.subscribe(response => {
            if (response.id && response.progress < 100) {
              this.searches[hash].requestId = response.id;
            }
           this.calculateProgress();
          })
        );
        return search$;

Background search collection saved objects

Each background search collection object will have:

interface BackgroundSearchCollection {
  searches: Array<{
    requestHash: string;
    searchId: string;
    searchStrategyType: string;
  }>;

  url: string;

  // Storing the state as an object instead of in the URL will make migrations easier,
 // but I'm not sure of the logic for restoring this state to the URL.  Can the view links for
 // each background search collection be
 // generated from this plus the state-free url using new state management utilities?
  state: Object;

  // I think we will need an app id so each app can handle migrating it's own URLs that
  // are stored in here.
  appId: string;
}

Graceful fallback

If an embeddable doesn't use the search service, those searches will just be executed normally when the background search URL is opened. This is actually somewhat problematic because the searches will be out of sync if we have relative time ranges. The cached searches "now to now - 1y" may end up showing up "now - 4days to now - 1y and 4 days" but embeddables that aren't using our search collectors will send live queries.

I suspect this is something we can hack around, or just live with. Also, if we switch to absolute time ranges, which will probably be better for the user anyway, and more accurate, this shouldn't really be a problem. The panels will be showing the same information.

Corner cases and considerations

When to clear the search cache?

Consider the scenario:

User opens a dashboard from a background search collection, so it is using the cached searches. URL has something like bgSearchId:123 in it.
User adds a filter, now all the searches are cache misses (correct).
User then deletes that filter. If we did not clear the cache, the searches would match the previously cached searches but we should actually send new searches because it's what the user would expect (debatable??).

Similar:

User is viewing a backgrounded dashboard and clicks "Refresh". We now want new searches to be executed.

Easiest technical solution: Dashboard embeddable can detect any update to it's input and clear the cache, however, this may prove to be slightly difficult because of async updates to dashboard input. For example, when the app first loads there are no filters, then filters are later added async by the filter manager. If this is all part of the page loading process, we want the searches to stay the background versions. But if the user interacts with the page and does something, we want them to be new searches. We can possibly hack around this in the dashboard app, but every solution that wants to use this system needs to consider this

What happens after you "send to background"?

Ideally:

Stay in current view, show butter bar that says "you are using cached version". Any edits you make at this point will cause completely new searches to be sent (the cache to be cleared) and the butter bar to go away. Even just hitting "Edit".

Is your current view using the search ids that are saved in the background search collection saved object you just created, or are they using different ids?

Same, but any edits should clear search collector cache and not cause any of those original searches to be aborted. We have to be sure that no two search collections use the same search ids, this could cause problems if the user deletes one bg search collection, we don't want it to cause other searches in another search collector to be deleted as well. Note we probably want to have the abort on panel destroy logic changed to go through the search collector so the search collector can conditionally not abort if the searches have just been sent to background.

If the current view is editable, and using the same search ids, we have to be careful not to accidentally abort them, for instance if the user deletes a panel. What happens if a user adds a new panel and then wants to "send to background" again - you don't want two search collections using that have the same search ids

What if you were viewing a relative time zone, will it switch you to absolute?

We don't redirect
We create brand new searches that are attached to the background so if the user deletes a panel or navigates away, the search that gets canceled is not the search saved with the background collection.
We can show a toast or butter bar with a link, like:

What happens if you hard refresh the page at this point?

Ideally, you stay inside the dashboard application but it is using the cached version

Two panels that share the same exact request, one gets deleted

For instance, if we have two panels that share the same request, and one panel gets deleted from the dashboard, this will cancel the other panel's search. We may want to have a count on each search so we don't do this as long as there is >1 user of a search.

Easiest technical solution: Accept this as a known issue. I don't expect it would be a huge issue in practice. Refreshing the page should cause new searches to go out so there is a workaround.

Serialized searches coming through the collector

Any expression that contains two search functions will hit this (though it's not many). For instance, a user clicks send to background while the first search is running. The second search will not have a saved background id for it and it will be re-executed every time the background search collection is loaded back up.

This happens today in:

Pre flight request for date histograms to determine the interval. This happens for every date histogram not on primary date field.
Terms agg with other bucket. Second request is to create the other bucket.
Pie chart with multiple layers. The size of first agg affects later queries, could be many queries.
Maps? I'd guess they send searches out in parallel.

Easiest technical solution: Just not support this use case initially. Secondary searches will not initially be cached.

Searches that have already completed

We need to also test the situation where some searches have already completed.

Expose filters/time range from the dashboard

Since the filter bar and time picker are not yet part of the dashboard embeddable, we'll need to make sure those are shown to the user in the embeddable viewer somehow, or bring that bar inside.

User leaves page before a search returns the first response with the id

We can't avoid showing the "send to background" button until every search has an id because searches can come in at any time, we don't want this button flickering. How do we handle this situation? Similar to the situation with an expression with serialized searches.

Relative vs absolute time ranges

We should convert relative to absolute time ranges or it will be confusing for the user to see "last 5 years" or something, but it's not actually "now", now is some time in the past.

This is something solution teams will have to manually handle this.

Expiration of searches

The current plan is that searches will expire after some amount of time after they complete. The problem with how that will work with this search collector is that we theoretically could have one search in the collector that expires before another search ever even completes (albiet this is probably unlikely, theoretically possible).

Another thing we want to answer is how to determine if a search is expired and communicate that to the user. Do we auto delete these, or make the user delete the objects when they are expired?

Should the Embeddable that is backed off of a background search collection be editable?

Ideally yes.

Embeddables by reference vs by value

Situation is:

User creates a new dashboard. Adds two panels.
Users saves the dashboard, URL is now just the dashboard id.
User clicks "run in background". URL stored with background search collection is just the dashboard id plus bgSearchId=xyz.
User edits a visualization in that dashboard.
What happens if the user goes back to "view" the background search?
- The unchanged visualization will match a cached search. The other panel with have a cache miss and a new search will go out.

Similar situation, maybe more awkward:

User creates a new dashboard. Adds two panels.
Users saves the dashboard, URL is now just the dashboard id.
User clicks "run in background". URL stored with background search collection is just the dashboard id plus bgSearchId=xyz.
User edits the dashboard and deletes a panel.
What happens if the user goes back to "view" the background search?
- The user will see the newer saved dashboard that only has one panel, not two. It no longer looks like the dashboard that was actually "sent to background".

Note that the second situation will not happen with the embeddable viewer version because the embeddable input is saved as a snapshot with the URL state, BUT, the first situation could still be encountered because children embeddable input would continue to be a reference to the saved visualization, not a visualization "by value".

We might be able to partially solve this if we put the dashboard input back in the URL so it overrides any changes made on the saved dashboard.

As for the first scenario, there is not much we can do without a major overhaul (although some of this work is already underway here: https://github.com/elastic/kibana/issues/51318

URL state Migrations

We need to be careful that teams that register their own URLs understand they will have to migrate the state in the saved objects if any of the URLs + state will no longer work. It's unclear if saved object migrations even support multiple plugins adding migrations for the same saved object type, but one workaround, while not the ideal scenario, could be to have teams write the migration in the background search plugin.

Easiest technical solution: Move forward knowing this is a risk for bugs, communicate with teams they should add bwc tests for URLs.

Best technical solution: Probably something like https://github.com/elastic/kibana/issues/25247 - store URLs as a common saved object type and link to that. Only one saved object to migrate, any team that uses the URL service knows they will have to migrate their saved objects.

Roadmap

[ ] Introduce Basic Search Collections and Advanced Search Collection, used by AdvancedDashboardEmbeddable to expose a "run past timeout" option to the user. At this point we can also likely easily expose dashboard level progress plus cancellation.
[ ] Expose "send to background" capabilities for dashboards with new embeddable viewer
[ ] Render partial results that are backed by an expression (https://github.com/elastic/kibana/issues/53336)

elasticmachine commented 4 years ago

Pinging @elastic/kibana-app-arch (Team:AppArch)

lukasolson commented 4 years ago

@ppisljar and I discussed another possible approach that relies more on the Elasticsearch cache and (possibly in a second phase) building services to execute the interpreter in a background job.

When a user navigates to a dashboard, we would execute everything as normal. If a user then selects to run in the background, we would gather the state for the entire dashboard (including each embeddable). I understand this is currently something that is not available since embeddables are currently passed their saved object ID instead of the actual inputs, so this is something we'd have to build. It would probably look something like this: Container embeddable gathers its own state, then calls getInput on each embeddable recursively. The input would have to be serializable, because we would store it, in addition to a URL, in a saved object. (Search IDs would not be stored in the saved object.)

When re-visiting the URL, the dashboard would pass down each embeddable's input which would need to contain all of the state information the embeddable needs to know to execute and render, instead of the saved object ID (or async search IDs). Each embeddable would then execute and render as usual . . . however, the requests to Elasticsearch would then execute much faster (because of the Elasticsearch cache).

We would still need to pass some sort of cache: true | false parameter in the interpreter execution, because if a user clicks "refresh", we assume they would actually want fresh data from Elasticsearch. The interpreter function could check this flag and forward it on as the request_cache parameter to gather fresh data.

This may work for a first phase approach, but you still run into the edge case for expressions that contain multiple requests to Elasticsearch, with the latter based on the previous. If we want to support these types of functions, we would need to build a service that truly executes the entire expression in the background. The output of such would not be stored anywhere, it would just need to ensure that it continues execution so that expressions with multiple requests to ES wouldn't stop with the first request.

One downside to this approach is that it relies on the Elasticsearch caching, which (in the current plan) would be cleared out each 4 days. We have no idea of knowing when the cached results "expire" (or are cleared). Also, what if one request executes quickly, and another request takes 4 days to complete? This is an edge case we need to consider.

What are your thoughts about the feasibility of this approach @stacey-gammon?

stacey-gammon commented 4 years ago

If I understand this correctly:

We work hard to ensure that the requests sent out in the current dashboard will always match the requests sent out from the "cached" dashboard, to avoid the awkward situation of having searches that are not cached. So having visualizations "By Value" would help this, even though each embeddable author would still have to 1) offer a by value embeddable, and 2) use our search services or use the expression language.
We rely more heavily on elasticsearch caching requests, and using the cached results, even if we don't explicitly send in a request with a search id. Our "background search collection" saved objects are really only there for the UX, and to help preserve the state of an application at a given time, assuming that if we reload that URL with the given state, every query that results from it will hit an elasticsearch cache and be faster. Eventually the cache will miss and it'll once again be a slow query

The only thing Kibana has to do is decide when to send request_cache and when to not actively abort a search.

It sounds a lot simpler from the Kibana side, but it also seems like we would have a lot less control. I have a feeling this isn't going to work out well.

For example, how would we be able to calculate progress and ability to cancel searches in this view:

Storing all state by value instead of by reference

Agree this is the ideal long term solution. I suspect this will be a big feature in and of itself but @ppisljar thinks it might end up being easier to accomplish. If so, awesome.

Search collectors

We still need search collectors in order to get progress and cancellation.

So I think the only question here is, do we add logic in Kibana to hash the request and send the search id to es instead of the full request.

Pros of not doing this mapping:

Less logic in Kibana, simpler code. My argument here however is that I think at this point, we'll already have search collectors it wouldn't be that much extra work.

Possible shorter term wins

Synced with @ppisljar today to go over this and above comment. We think the best plan forward is to start with the search collectors, which we'll need anyway. But once we have those incorporated it would be worthwhile to see if we can get away with an easy win by exposing:

Ability to continue the requests even after the user leaves the page. Maybe this ties into the "run past timeout" setting. Options could be:
- "Run past timeout but cancel searches when I leave the page"
- "Run past timeout and continue searches even after I leave the page. Searches will automatically expire in 4 days. Be sure to use absolute time range and have 'use cached searches' checked."
- Ability to specify on any dashboard to use cached searches if available, and mention this works best in dashboards with absolute time ranges.

Then, we don't need any new saved object types, no management screen. How to get fast searches? Click "run past timeout" and check "Use cached searches". If cached searches are available in ES, they will be used automatically. This is way easier to implement than the background saved object collections. Nice idea @ppisljar!

ppisljar commented 4 years ago

search service is not passed to the interpreter function, rather interpreter function is given reference to the search service at the time of registration to the registry. this doesn't affect much, instead search function being one of the handlers, we would pass in an instance of searchCollector (similar to what we do with inspectorAdapters)

stacey-gammon commented 4 years ago

search service is not passed to the interpreter function, rather interpreter function is given reference to the search service at the time of registration to the registry.

If we did that though, we couldn't scope it and it would be a global singleton search collector. I think we need to allow the embeddables to pass it in via extraHandlers. We could perhaps do both, so if the embeddable opted not to pass in a specific instance of a search collector, functions could use the global one, though it might be confusing then which one functions should use. Maybe better would be that the interpreter decided which search to pass down as part of handlers - a global one, or one passed in via extraHandlers.

ppisljar commented 4 years ago

@stacey-gammon thats not the case (the part about not being able to scope it), we would be passing the instance of search collector to interpreter, which means we can pass a different instance in each time we execute an expression (if required), so the app is completely in control.

So at the time of registration, we give esaggs a reference to the search service, but at the time of execution we pass an instance of searchCollector in. (how we create the search collecton can stay out of this discussion as its irrelevant)

I would avoid using global search collectors for the reasons you described to me earlier. If somebody would really want to do it, he would need to make sure he passes the same instance of search collector to everything making the requests.

stacey-gammon commented 4 years ago

Okay, gotcha, I think we are actually on the same exact page then, I just read your comment incorrectly!

lukasolson commented 4 years ago

Closing in favor of https://github.com/elastic/kibana/issues/61738.

elastic / kibana