elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.77k stars 8.17k forks source link

[Expression][Perf] Take advantage of the new column JSON format #33290

Closed monfera closed 1 year ago

monfera commented 5 years ago

The column JSON format is provided by the solution for https://github.com/elastic/elasticsearch/issues/37702

Instead of the often used row format:

{ rows: [ {alpha: 3, beta: 2}, {alpha: 5, beta: 6}, {alpha: 6, beta: 4}, ...] }

or

{ rows: [ [3, 2], [5, 6], [6, 4], ...], columns: ['alpha', 'beta'] }`

the column oriented format:

{ columns: { alpha: [3,5,6,7,4,2,....], beta: [2,6,4,3,2,1,...] } }

has these benefits:

  1. It yields a smaller payload, as the noise character count per record equals the column count (it's 3x as much for row arrays, and a lot more for row objects)
  2. It should compress better, because like data (eg. all integers) come sequentially, unbroken by type/domain changes and the noise (row) delimiters (sorted, low cardinality columns should compress even better)
  3. Many operators (here, in the pipeline expression language) can be implemented more efficiently with a columnar format
  4. Many 3rd party renderers (eg. Plotly, Highcharts), and performance-oriented renderers in general, work with column data; transposing large arrays is expensive, blocks the main thread and generates garbage that adds to jank (frame drops)
  5. It leads to a nice internal format (the array of arrays needs a prop per column, for the respective column names - why not put the data vector there in the first place)
  6. (...future) This format lends itself well to advanced compressions common in the industry, such as delta encoding and run-length encoding, assuming eventual server side support for these

To take advantage of the column oriented format:

  1. The data source functions should be able to query, via an argument (eg. format="column"), in the column format
  2. An automatic type conversion from column to row oriented format should be done for functions which only take one of the formats
  3. At least the functions benefitting from a column format most, should be changed such that they can take the columnar format directly, without transposing
  4. Apply a heuristic, that, in the absence of a query specifier arg in the query (row vs column) loops through the subsequent steps of the expression, and decides which format to use based on the first processing node it encounters that can take only one of the formats. Alternatively, execute the pipeline such that it switches to the other format as infrequently as possible, while preferring the "natural" orientation for each function
  5. (...future) a proper pipeline optimizer should decide, based on the expressions of interest, which format to query in, and how to further represent data across the subsequent data processing steps, but this could involve loop fusion, and reorderings eg. pushing down selections, pulling up join-like operations etc. based on equational semantics as in relational algebra

While the list gets daunting toward the end, there are lots of benefits to be had by just Item 1 - this lets us pipe column data directly from a query into a column-based renderer, and Item 2 lets everything work without breaking anything, even if the user starts with a columnar query but (some of the) rest of the node functions don't natively handle the column format. Item 3 is still straightforward and may speed up existing use cases.

elasticmachine commented 5 years ago

Pinging @elastic/kibana-canvas

costin commented 5 years ago

Re the first 6 - ES and SQL already supports CBOR and SMILE, which are cross-platform and I would expect provide excellent compression.

elasticmachine commented 1 year ago

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

stratoula commented 1 year ago

This is a request for canvas which supports SQL but there is no plans for adding more features in canvs so I am closing this. We can always reopen if this becomes relevant again