[Research] Kibana Expression Language impact to Lens charts performance

thomasneirynck commented 7 months ago

With Canvas and the legacy visualization not part of Serverless, it is unlikely we will see further evolution of these features. They may even end up being removed altogether.

They significantly rely on the Kibana Expression Language (KEL).

There are two ways to interpret the role of KEL.

an inner abstraction for the representation of a visualization (its data + its configuration). Essentially, the AST of KEL. Used by LegacyViz, Lens, Canvas. Generation of this AST is on-the-fly, and is nowhere persisted. (ie. "expressions")
the actual piped language, that is only user-facing in Canvas.

With the introduction of ES|QL, the data-fetch part can now be fully captured in the data-layer as well.

Goal

better understand the overhead introduced by having Lens rely on expressions
Is there a way to isolate both Canvas and KEL from Lens, so a deprecation of LegacyViz and Canvas will not create dangling dependencies on KEL/expressions.

elasticmachine commented 7 months ago

Pinging @elastic/kibana-presentation (Team:Presentation)

elasticmachine commented 7 months ago

Pinging @elastic/kibana-visualizations (Team:Visualizations)

stratoula commented 7 months ago

@thomasneirynck I am a bit confused from this issue 🤔

What do you mean with this?

Is there a way to isolate both Canvas and KEL from Lens, so a deprecation of LegacyViz and Canvas will not create dangling dependencies on KEL/expressions.

How is canvas connected to Lens? (except from the fact that a user can create Lens visualizations)
Lens heavily relies on expressions. We can def deprecate Canvas but leaving Lens working with them (even if the data fetch part is done with ES|QL (via expressions). I can see some advantages in the expressions logic. They have a similar pipe logic which could potentially be used with ES|QL for configuring the charts.

better understand the overhead introduced by having Lens rely on expressions

Here you mean performance overhead?

dej611 commented 7 months ago

I do not think there are specific Canvas expressions used by Lens. That area would be the only overlap I can think of between Lens and Canvas, which is probably 0.

stratoula commented 7 months ago

I do not think there are specific Canvas expressions used by Lens.

There are not. We can safely deprecate aggregation based viz and canvas without causing any problem in Lens. This is why I am a bit confused from this issue. Is it about this? If yes, then no problem. Is it about deprecating expressions in general? Why?

markov00 commented 7 months ago

I believe Thomas is trying to understand if we are benefitting from having an expression language today, given the fact that Canvas and Legacy viz are going into maintenance mode, the language is not exposed to the final user (only in canvas), and moving computation from client to server side was a dream never fully implemented. @stratoula and @dej611 can you please enumerate the current benefits you are aware of about the use of expressions and their features (variables, general table manipulation functions, etc) in our viz/lens context?

stratoula commented 7 months ago

So it is about removing expressions language in general. Got it!

Expressions are used in Lens but also in ES|QL editor and fetching ES|QL data in alerts, discover etc. I am not an expressions expert so maybe @ppisljar should be involved in this discussion.

The most important thing that makes me sceptical on removing them is:

is a big architectural change for Lens. Which will be the benefit of refactoring? (Maybe there is one, I am just asking this question loudly)
I was hoping in the future to take advantage of expressions architecture and enhance the ES|QL language with our commands, something like:

from foo | stats maxB = max(bytes) by destination | bar x=maxB y=destination

dej611 commented 7 months ago

@stratoula and @dej611 can you please enumerate the current benefits you are aware of about the use of expressions and their features (variables, general table manipulation functions, etc) in our viz/lens context?

I think there are not specific benefits provided by the expression language, as it never took it off the idea of caching/memoizing expression functions. Due to the mandatory serialisation step of the expression callbacks/functions were never allowed to be passed, which made things a little bit more tricky, but also made the server able to re-use it (I think @drewdaemon has one project leveraging this for dashboard requests inspection).

The framework itself was useful to write re-usable functions, but in the specific case of Lens no-one else ever used it, apart from the rendering ones who are now shared across agg-based and Lens. Mind that even legacy viz will not be "supported" in serverless, they should still be visible as read-only, so there's still a dependency on the expression chain, in particular the rendering fns will still need to be there. In the long run, once Lens would be the only rendering editor, then we could think to touch it.

I would argue that data-fetching is only a partial, probably not substantial, set of expression functions that Lens uses, and for the remaning ones most of them could be rewritten as specific packages/modules.

I was hoping in the future to take advantage of expressions architecture and enhance the ES|QL language with our commands, something like:

from foo | stats maxB = max(bytes) by destination | bar x=maxB y=destination

This is an interesting take, which would gracefully fit with @ppisljar 's new tool/logic I guess.

stratoula commented 7 months ago

Yeah we can re-write things for sure and remove the expressions logic. It is a huge change though. Unless we think that this will improve the performance, in that case this is a strong argument.

ppisljar commented 7 months ago

I agree with @stratoula that this seems like a big refactoring. I am sure we would be able to provide all features expressions provide to lens in a different manner, but what exactly are the pain points we are looking to solve ?

I can think of two, but its not straight forward for a rewrite to do better at them and still do as good on other things (and not introduce tons of bugs):

forced serialization that @dej611 is mentioning, or in general the overhead for the developers that want to implement new fearures
performance aspect (could we make it faster? )

i didnt hear performance issue come up but i did hear complains about the overhead.

Definitely it would be a big change, which wont bring anything to our users directly (no immediate functionality change)

markov00 commented 7 months ago

I believe the idea of this issue is exactly that: do some research to understand the expression language overhead, understand its impact from the point of view of performance (execution time), and find its maintenance cost (tech debt here is not negligible).

Definitely it would be a big change, which won't bring anything to our users directly

I would argue that a performance improvement will bring to the user values even if we don't provide functionality change. I believe we should focus more on speed, cleanups, UX, and docs, rather than new features.

Is clear that is a big change, no one said the opposite, but we should also look forward and understand what we bring along with our code, what are their benefits and drawbacks. That's why I've asked if you folks could write down a list of benefits you know are valid when using the expression language. IMHO this list today is not very strong:

this is how it works today and is a big effort to remove it
can be piped to ESQL (but probably not using the same serialized form it has today)

stratoula commented 7 months ago

I think we all say the same here folks. If it brings a good performance boost it is a strong argument and we should consider, otherwise the size of the effort might not worth it.

The question is: Is it going to make the charts more performant? If yes, how much? Maybe we don't gain much and would be better to prioritize other tasks which would give us a bigger impact.

ppisljar commented 7 months ago

it might be hard to reliably measure the performance overhead, but we can try to think about it a bit ...

once the charts are rendered they do not rerun expressions unless we need to fetch new data, so i don't think this will make the charts themself more performant, at least not significantly
for the datafetching part we did some evaluation of the overheads in kibana in the past and expressions didnt come up at that point as having significant overhead, rather things inside server side search and kibana elasticsearch client were brought up. I would consider the expression overhead here very small compared to the overall time we need to fetch the data. If you think for example about esaggs function, the difference between calling an expression function and calling service directly is mostly about passing data pointers thru few additional layers of function calls. But that might not be the case for functions where we are forced to serialize/unserialize some data structures. @dej611 will know more about such functions.

drewdaemon commented 7 months ago

I think @drewdaemon has one project leveraging this for dashboard requests inspection

Yes—this PoC. But it only uses expressions because Lens does and I needed to replicate existing Lens behavior on the server.

dej611 commented 7 months ago

All the performance issues related to expression I've debugged in the past were not related to the expression language itself, rather all falls within these two cases:

actual function execution (think of mathColumn here: src/plugins/expressions/common/expression_functions/specs/math_column.ts as a super efficient version of mapColumn)
Recursion max callstack limit due to the Rx library usage.

In terms of "perceived" performance, Lens was built with the promise to have some sort sort of memoization happening at the expression level execution that never landed. This leads to some issues ( e.g. #175485 ) as Lens will trigger requests which have still to take flight before receiving a 304, and some function execution for any form configuration change. This is something that can improve but it doesn't necessarily depends on the language itself.

I'm all in for better performance, but before anything I would like to see some benchmark/comparison data sheet. Without that is just a witch hunting IMHO, but I understand that's what @markov00 proposes.

In terms of data fetching I do not think there's any particular overhead there, or at least while measuring it for some dashboard performance analysis reported that the tabify implementation was the only performance hit as it was more than linear on the input size. Specific Lens implementations have no particular hit on the performance as they are mapping specific rows or columns to operate it.

As @ppisljar already reported most of the issue here is about DX, for instance passing an object to an expression function as an argument requires some black magic so it is often either serialised as JSON or rebuilt at function execution time via some combination of ids/keyword/values. So far DX is the only pain point I would highlight about the Expression language, which has some important weight here, but I would argue not enough to push for a full rewrite.

On the serialization side, I quite like the idea of having a pipeline both executable on client and server side, even tho we've never leveraged that ( @drewdaemon is the first concrete attempt I quite like).

thomasneirynck commented 6 months ago

Thank you all for the feedback. I will respond to comments directly at the bottom, but it may be useful to add context for why we need to build a shared understanding of the impact of expressions.

Expressions were an abstraction to support multiple applications. Only Lens will remain.

In the future, Kibana will no longer have multiple visualization builders, and will have no longer have multiple dashboarding apps.

The Agg-based visualization builder will be removed in favor of Lens. Canvas similarly will see its key functionality being adopted by Dashboards (i.e. pixel perfect layouting) and by ES|QL (e.g. an alternative piped query language).

The removal of these apps will mean that Lens will be the only application which has a key dependency on the data+rendering pipeline modeled by expressions.

Expressions are opinionated wrt serialization (e.g. tabification).

Expressions build a one-way pipe on how data is manipulated. At each command, the data is re-marshalled into intermediate JSON-representations, to be eventually be piped into the rendering-component.

Such an architecture is very clean but is not ideal:

for delivering large datasets to the browser: Large datasets are generally shipped as binary blobs. To efficiently consume them, clients leverage the properties of the format (e.g. metadata headers) to selectively read data. Generally, intermediate processing on the client is avoided completely.
for streaming data, to be read from incrementally: Such incremental handling is not supported by Expressions currently.

There are a few examples of charts in Kibana which do not use Expressions (or a similar architecture). e.g. both Maps and Flamegraphs consume raw Elasticsearch output, in Maps this is streamed to the client from Kibana server.

In the future, we will likely see more requirements for new, more efficient forms of data delivery:

Elasticsearch ES|QL is looking to adding arrow output format. This format is stream-friendly, and optimized to be consumed directly by the client process.
Other column-based layouts may give similar benefits. It would be a shame to discount space savings when Expressions would require tabification before data can be consumed by the chart.
Scatter plots, a long standing feature, often require hundreds of thousands of data points to be visualized.

Questions to answer now.

This research should focus on following questions:

What is the current overhead of expressions when rendering a chart. In terms of time, and possibly memory allocation. If we cannot answer this question, what metrics do we need to introduce?
What is the current impact of expressions on development and maintenance. This is a qualitative assessment.
How would the expressions accomodate stream-friendly dataformats as inputs? Are there any limitations?
How tightly coupled are expression with Lens?
Do all Lens charts need expressions? e.g. consider adding a scatter plot to Lens which would not use Expressions.

Questions to answer another time.

what is the level of effort to remove expressions, or replace them with something else (tm)?

This is not relevant right now. There is no requirement to remove it. We need to build an understanding of expressions. We have a similar initiative for other low level technologies, such as bsearch.

Comments

believe Thomas is trying to understand if we are benefitting from having an expression language today, given the fact that Canvas and Legacy viz are going into maintenance mode, the language is not exposed to the final user (only in canvas), and moving computation from client to server side was a dream never fully implemented.

Agreed. See above for more added context.

The most important thing that makes me sceptical on removing them is:

This is not really the ask. See above for motivation.

In the long run, once Lens would be the only rendering editor, then we could think to touch it.

This is running ahead. In the meantime, we need to understand better pros&cons.

what exactly are the pain points we are looking to solve ?

See above. It is unclear if it we need an added abstraction for a data-pipeline when Lens is the only consumer, and whether it may block us from building data-heavy charts.

Is clear that is a big change, no one said the opposite, but we should also look forward and understand what we bring along with our code, what are their benefits and drawbacks.

If there are clear benefits to expressions (beyond it being an abstraction underlying 3 apps), we need to better understand them. Similar to drawbacks.

In terms of data fetching I do not think there's any particular overhead there, or at least while measuring it for some dashboard performance analysis reported that the tabify implementation was the only performance hit as it was more than linear on the input size.

Would be great to have some numbers.

So far DX is the only pain point I would highlight about the Expression language, which has some important weight here,

What is that overhead to DX? Can we characterize this more clearly?

the idea of having a pipeline both executable on client and server side

This is not really the ask right now. With ES|QL, there is a query engine now in the platform. There is imho little need to have another one in Kibana server.

thomasneirynck commented 3 months ago

thx @markov00 for adding research on this here https://github.com/elastic/kibana/discussions/182551

thomasneirynck commented 1 month ago

Closing for now. Initial research done (https://github.com/elastic/kibana/discussions/182551). Will need to follow up with how to make this more actionable.

elastic / kibana