[Kibana Telemetry] Sharing collected data between collectors

crob611 commented 4 years ago

There's been a change in Dashboards as part of the "Time To Visualize" project where visualizations can now be embedded directly in a dashboard saved object instead of created as a visualization saved object and then referenced inside of the dashboard saved object. We'll call these two scenarios By Value (the new way) and By Reference (the old way).

Currently, the visualization collector does a query by type: visualization, which will find all saved visualizations and uses that for all of its calculations. The problem is that this will not find visualizations which are in a dashboards by value.

There has been work done by the embeddables team to be able to create embeddables server side, and this includes a method to collect telemetry for a given embeddable input . This would allow us to, for instance, run through all of the visualizations as before, and then fetch all of the dashboards and loop through their visualizations. But, while looping over the dashboards, we would also want to collect information that is going to be given to the dashboard collector as well.

So, ideally, this "Embeddable Telemetry Collector" could run before any of the collectors fetches start, and the results of the "Embeddable Collector" could be passed in to any of the collectors that would need that information. The only thing I can't figure out how to do now is how to scope this to run with each collection request.

Any thoughts on the best way to go about adding this kind of functionality? I'm thinking maybe the easiest way would be the ability to add add-on contexts to the collector set. And then when the bulk_fetch is called, we can initialize our custom context and it will get added onto the context that is given to the collectors fetch method? Thoughts on that approach?

elasticmachine commented 4 years ago

Pinging @elastic/kibana-telemetry (Team:KibanaTelemetry)

afharo commented 4 years ago

Hi @crob611, this is a very interesting use case. We could follow a similar approach to the kibana.json's requiredPlugins property.

However, at the moment, all the fetch methods run concurrently: https://github.com/elastic/kibana/blob/d47460d08d15c71ffcb61f2be642d44ec807d581/src/plugins/usage_collection/server/collector/collector_set.ts#L179-L198

The reasoning for this is because we need to serve the usage as quick as possible because the API requests /api/stats?extended=true and /api/telemetry/v2/clusters/_stats need that information. As the number of collectors grows for every release, we can't run them sequentially or the response time will grow too much and we'll hit timeouts.

Maybe it's worth considering providing a lock&cache mechanism in this specific use case? I'm thinking that, maybe, that common "Shared Embeddable Object" can be emitted by an Observer, and each fetch method that relies on it can run const sharedEmbeddableObject = await sharedEmbeddableObject$.pipe(take(1)).toPromise(); or something along those lines. Do you think this approach could work without changing the Collector's APIs?

crob611 commented 4 years ago

@afharo Understand about them all needing to run concurrently.

That's the path that I was trying to head down, basically a client (sharedEmbeddableObject as you coined it) that could be shared amongst all of the collectors that required it. But where I keep getting stuck is how to force that object to re-run every time collection happens. Forgive me, I've not done much with Observables, so not sure I completely understand the example, but would that not always give only the first value that sharedEmbeddableObject emitted? How would sharedEmbeddableObject know it needs to calculate and emit new values?

As I mentioned in the original comment, I'm experimenting with adding the ability to add-on to the fetchContext.

So doing something like

usageCollection.registerAddonContext(
    'embeddable', 
    () => {
      const pendingPromise = sharedEmbeddableObject.run();
      return pendingPromise;
    }

And then a collector could subscribe to this new context like how they do with KibanaRequest

usageCollection.makeUsageCollector({
  extendFetchContext: {
    embeddable: true
  },
  fetch({embeddable}) {
    const results = await embeddable;  
  }
})

Thoughts on that? Or is there a way to accomplish this in the original way you were suggesting?

afharo commented 4 years ago

I think this use case can be solved with something like this (maybe there's a more elegant way to solve it via RxJS).

import { UsageCollectionSetup } from 'src/plugins/usage_collection/server';

let count = 0;
async function calculateSharedEmbeddableObject() {
  // Some logic here calculating the embeddable object
  console.log(`----> Executing for ${count}`);
  return { test: count++ };
}

let promise: Promise<{ test: number }> | undefined;
async function getSharedEmbeddableObject() {
  try {
    if (!promise) { // if there's no running promise, declare it
      promise = calculateSharedEmbeddableObject();
    }
    return await promise; // await the promise
  } finally {
    // once completed, empty the promise value for future executions.
    promise = undefined;
  }
}

export function registerTestCollector(usageCollection: UsageCollectionSetup) {
  usageCollection.registerCollector(
    usageCollection.makeUsageCollector({
      type: 'test-sharedEmbeddableObject-owner',
      isReady: () => true,
      schema: { test: { type: 'short' } },
      fetch: async () => {
        return await getSharedEmbeddableObject();
      },
    })
  );

  usageCollection.registerCollector(
    usageCollection.makeUsageCollector({
      type: 'test-sharedEmbeddableObject-consumer-x2',
      isReady: () => true,
      schema: { testx2: { type: 'short' } },
      fetch: async () => {
        const { test } = await getSharedEmbeddableObject();
        return { testx2: test * 2 };
      },
    })
  );

  usageCollection.registerCollector(
    usageCollection.makeUsageCollector({
      type: 'test-sharedEmbeddableObject-consumer-x3',
      isReady: () => true,
      schema: { testx3: { type: 'short' } },
      fetch: async () => {
        const { test } = await getSharedEmbeddableObject();
        return { testx3: test * 3 };
      },
    })
  );
}

While running this code, you'll see the console.log is only shown once per collection, but all three collectors get the value.

That said, I can see a lot of potential in your suggestion for future use cases. I think it's definitely worth exploring it! The only downside I can see is that we'll be exposing this info to every other collector out there (and the TS definitions are harder to infer).

@TinaHeiligers & @Bamieh, what do you think about exploring the new API?

usageCollection.registerAddonContext(
    'embeddable', 
    () => {
      const pendingPromise = sharedEmbeddableObject.run();
      return pendingPromise;
    }

Then we'll Promise.all call all the registered AddonContexts and append their results to the fetch context.

crob611 commented 4 years ago

@afharo That looks great. That was the original path I was exploring but maybe you can calm a few concerns that I had with going that route that for sure would not exist going the add-on context route.

1) The getSharedEmbeddableObject promise resolving before everyone who needs to use it is actually subscribed to it, resulting in a potentially expensive action happening multiple times.
2) (I may be confused about what can trigger a new "Collection" to happen) Could there be multiple collections fired in close succession where the existing Promise has not resolved, so that new collection request would be reusing the promise from the previous run? This would most likely not be a big deal as there probably could not have even been a lot of action happening between the two requests, but was just trying to figure out the correct way to go about it to ensure that couldnt happen.

I think I'm going to proceed with doing it this promise way (because like you said, the type inferrence for the addons is proving to be complex) unless you all think either of those concerns are enough to not.

afharo commented 4 years ago

1. The `getSharedEmbeddableObject` promise resolving before everyone who needs to use it is actually subscribed to it, resulting in a potentially expensive action happening multiple times.

That is a valid concern. If they are not happening at the same time, you might need to set a timeout to clear the promise after a certain amount of time (as a cool-down period if you like). To be fair, I don't think the results from the function will be much different a couple of seconds apart 🙂

2. Could there be multiple collections fired in close succession where the existing Promise has not resolved

If you hit the API very eagerly multiple times in a row, it may happen.

POST http://localhost:5601/api/telemetry/v2/clusters/_stats 
{ unencrypted: true }

But I'd say it's actually a good thing to cache some of the requests if anyone is intentionally hammering our APIs a bit too eagerly (related issue: https://github.com/elastic/kibana/issues/75886)

elasticmachine commented 3 years ago

Pinging @elastic/kibana-core (Team:Core)

Bamieh commented 3 years ago

I am all in for exlporing adding a new api if needed but i'd like to discuss the use case a little bit more before jumping into any implementation ideas.

To make sure i understood the scope of the feature I'll restate it:

You are using an expensive api call provided by the embeddables team.
The resolved payload from this api call is required by multiple collector.fetch functions.

Assuming the above is correct:

We do still have to perform at least 1 expensive call when the usage is called. This is not ideal since as Alejandro mentioned we do try to keep the response time as short as possible.

a. Can we create a task to store the data into another saved object and then have the all the collectors grab that saved object instead of calling that expensive api? (we do have multiple collectors doing this already).

b. Can we explore a new collector that groups all embedable related data into 1 collector. All related data from this api call would be in 1 single collector.

crob611 commented 3 years ago

@Bamieh Yeah now that I've had some time to think through it, I think you are correct. This should be something that lives entirely within Embeddable (or whatever) and happens on a schedule and then whatever collector needs to can consume the result.

pgayvallet commented 5 months ago

If my understanding is correct, we agreed the solution should be performed at another level (within embeddables), so I think we're safe to close this. Please reopen if necessary.

elastic / kibana

[Kibana Telemetry] Sharing collected data between collectors #84165