Vega vs. Vega Lite Data Transformation Capabilities

avatorl / Deneb-Vega-Help

Do you need help with Deneb custom visual for Power BI and/or Vega visualization grammar? Create an issue here to get assistance from Deneb community expert Andrzej Leszkiewicz.

4 stars 0 forks source link

In the context of Deneb, are Vega Lite and Vega different in their capabilities for data transformation?

With a Vega Lite Deneb visual, I worked myself into a corner where referencing my original dataset twice, transforming each object independently, and then joining the results back together would be helpful. I know power query is the best place to do this sort of work, but other constraints have me working inside Deneb to accomplish this. Deneb's documentation seems to reference syntax for an array of data objects ("data":[{object}, {object}]), but this throws up errors for me when I try it.

I know it can be done in Vega Lite outside of Deneb. However, I'm struggling to do so within the constraints Deneb places on users. Maybe I have just missed something. Regardless - would Vega would allow me to do this (define multiple data objects, perform different transforms, and then join them) in Deneb, or would it have the same limitations as Vega Lite?

For context - the issue I'm trying to solve is this: I have two conflicting aggregation operations. One needs to create aggregate counts based on the whole dataset, and the other needs to create aggregate distinct counts based on only part of the dataset at a different level of granularity. To accurately calculate the latter, I have to introduce a filter that removes data I need for the former, and if I DON'T filter it, it miscounts the number of unique values (even if I set all irrelevant values to null in a new column and reference that instead, it still counts "null" as a distinct value, which throws calculations off). I know this is all very abstract - here is a sample in the vega editor that shows what I mean: https://tinyurl.com/2avytju5. If I should be posting this elsewhere, please let me know.

I don't know the exact difference between Vega-Lite and Vega regarding data transforms, I never really bothered learning Vega-Lite (that's just a simplified version of Vega). I'm not sure it's even possible to create multiple data tables in Vega-Lite (you need to ask someone else about that to be 100% sure).

But in Vega you can create new data tables referencing an existing one and applying transforms to the copy. So you for example get "dataset" from Power BI, then you can create an aggregated version of the "dataset", then you can create filtered version of the "dataset", then you can create an aggregation version of the filter one and so on.

See for example https://www.powerofbi.org/dataviz-vega/?VegaChart=column-line-dynamical/column-line-dynamical.json

That's Vega outside of Deneb. In Deneb you can have only one input table and it must have "dataset" name. But the examples shows a chain of transformations applied to the original "dataset-raw" table to create "dataset" table and then "dataset-line" and "dataset-rect" tables using the "dataset" table as a source.

So you can create new tables by using "source" property to reference an existing one and then applying various transforms.

```

{ "name": "dataset-raw", "transform": [ {"type": "sequence", "start": 1, "stop": 201, "step": 1, "as": "id"} ] }, { "name": "dataset", "source": "dataset-raw", "transform": [ {"type": "filter", "expr": "datum.id<=DataPoints"}, {"type": "formula", "expr": "ceil(random()*100)", "as": "value"} ] }, { "name": "dataset-rect", "source": "dataset", "transform": [ {"type": "filter", "expr": "DataPoints<=50"}, {"type": "collect", "sort": {"field": "id", "order": "ascending"}} ] }, { "name": "dataset-line", "source": "dataset", "transform": [ {"type": "filter", "expr": "DataPoints>50"}, {"type": "collect", "sort": {"field": "id", "order": "ascending"}} ] }

avatorl / Deneb-Vega-Help

Vega vs. Vega Lite Data Transformation Capabilities #14