Data Package Data Views Specification (Visualizations)

rufuspollock commented 10 years ago

UPDATE August 2017

Draft spec at: http://specs.frictionlessdata.io/views/

UPDATE March 2017

v1.0 DRAFT SPEC AVAILABLE HERE: https://hackmd.io/s/SyTcmXPwl

OLD

What: a JSON spec for describing views of data like graphs, tables or maps. Probably focus this just on graphs to start with.

This spec is motivated by desire to include views support in to Data Packages. However, we aim to make this generic and reusable outside of Data Packages and we will avoid or flag anything Data Package specific.

This was motivated by experience with ReclineJS where we needed to serialize "view" configurations (e.g. graphs and maps) into JSON. I have been using this informally in data packages for a while - see e.g. https://github.com/datasets/house-prices-us/blob/master/datapackage.json#L157

Proposal

The datapackage.json MAY contain an attribute named views. The value of views MUST be an array where each entry is a data "view" descriptor. Each data view descriptor MUST be a JSON object and MUST have a type key which uniquely defines the type of view (and what other attributes should be present).

Here's an example:

{
    "id": "...",
    "title": "...",
    "type": "Graph",
    # this is specific to Data Package
    # optional - resource id or name to pull data from. defaults to first resource if not specified
    "resource": 1,
    "state": {
      # this will be specific to each type of graph
      graph-state ...
    }
  }

Planned

Whilst new tools can specify their own "view" information we provide define 2 specific types:

type: graph
type: map
type: vega - follow the vega spec
- important question
  type: graph

Propose: we use Flot or Vega spec - http://trifacta.github.io/vega/

type: map

Propose: we run this off leaflet (??)

Research

Vega: https://github.com/trifacta/vega/wiki
- Vega has a lot of relevance. However, it is very oriented to being a JSON version of the "grammar of graphics". As such it is somewhat painful to use for very simple things out of the box like a line graph or simple map. However, there is no reason that Vega could not be a subtype e.g. type: vega as a graph type in Data Views.
Plotly have a spec it seems according to their Open Source announcement in Nov 2015: https://plot.ly/javascript/open-source-announcement/ - actual schema is here http://help.plot.ly/json-chart-schema/
- Quote from announcement re relation to vega-lite: "The plotly.js JSON schema and API is more like MATLAB or Python’s matplotlib than most JavaScript charting libraries available. It focuses on the chart’s physical attributes and attempts to leave the chart data separate (a workflow that scientists and engineers are accustomed to). For chart types that require binning (contour plots, histograms) or min-max decimation (line plots with >100k lines), some precomputation in JavaScript has been unavoidable."
Flot "spec": https://github.com/flot/flot/blob/master/API.md
- TODO: more analysis and documentation here.
http://tenxer.github.io/xcharts/docs/#data
VizJSON from IBM - http://www.ibm.com/developerworks/opensource/library/bd-vizjson/index.html (grammar of graphics oriented)

rufuspollock commented 10 years ago

/cc @vitorbaptista @kindly interested in your thoughts here

besquared commented 10 years ago

+1 for promoting vega in this context. I think it makes a lot of sense.

besquared commented 10 years ago

Are views always going to be local to the package? Should they be treated like resources? Should they have either path, url or content fields to specify what they are? I feel like views make a lot more sense if they're treated kind of like resources in this sense.

rufuspollock commented 10 years ago

@besquared at present it was assumed that views were always local to the data package. Why treat them as resources? "resources" is really short for "data resources" (i know the term resources is very generic ...). What would the path/url/content fields point to?

rufuspollock commented 10 years ago

Add some more detail and an example of the JSON

domoritz commented 9 years ago

In the interactive data lab (Jeffrey Heer's group, who created vega), we are working on an abstraction on top of vega which we will release as open source.

The specification will look something like

{
  "marktype": "point",
  "enc": {
    "x": {"name": "yield","type": "Q","bin": true},
    "y": {"name": "variety","type": "O"},
    "size": {"name": "*","type": "Q","aggr": "count"},
    "color": {"name": "year","type": "O"}
  },
  "cfg": {"dataUrl": "data/barley.json"}
}

This specification will be translated to a vega specification similar to http://trifacta.github.io/vega/editor/?spec=barley. I think this could be the right level of abstraction for views specifications.

As I said, this is work in progress and the main reason it's not open source yet is that we need to finish a few features, clean up the code and write documentation.

domoritz commented 9 years ago

We just open sourced https://github.com/uwdata/vegalite. It's a higher level grammar for visualizations and we have a json schema specification. However, although we just made it open source, we are still working on getting the documentation out.

rufuspollock commented 9 years ago

@domoritz fantastic - will start looking in more detail.

rufuspollock commented 8 years ago

@domoritz any further updates here - this is something I'm increasingly interested in. Would really like to get this into Data Packages as an official spec in some way.

I also wonder if vega-lite could leverage JSON Table Schema and/or Tabular Data Package for its data field stuff?

rufuspollock commented 8 years ago

@kiliakis can I suggest you take a look at vega-lite. I think we may want to start experimenting with this in our views ;-)

domoritz commented 8 years ago

@rgrp Vega-Lite is getting more and more traction and there is active support for vega-lite. We are planning to release vl 1.0 early next year but the current master is almost the final version.

Concerning your question about JSON table schema/ Tabular data package. Right now vl requires that you specify the data type (quantitative, ordinal, nominal, temporal and later geo) but we plan to remove that requirement in some cases (https://github.com/vega/vega-lite/issues/512). So having some schema information is definitely helpful. However, a few weeks ago we removed the need for explicit statistics (min, max, cardinality, ...) and compute these in vega if needed.

I'd say that vl is ready to be used to package visualizations. If you come across any problems, let me know.

rufuspollock commented 8 years ago

@domoritz really useful.

On the vega-lite data source re-using some of Data Package / JSON Table Schema I've just posted the following in https://github.com/vega/vega-lite/issues/512

@domoritz would it be possible to be using JSON Table Schema stuff here -- what's missing from JSON Table Schema? I guess my bigger question is whether you could replace your data source object with a Data Package resource object?

I ask because this is naturally what I would want to do in: https://github.com/dataprotocols/dataprotocols/issues/77 -- basically I want data specified by Data Package then connected with visualization specified by vega-lite. At the moment, obviously you end up defining your own data source mini-language and I wonder if we could converge there replacing vega-lite's data source with a subpart of Data Package spec and Data Package using vega-lite for view specs.

rufuspollock commented 8 years ago

Excerpting key stuff from that vega-lite thread here:

Feb 4

https://github.com/vega/vega-lite/issues/512#issuecomment-179735332

@domoritz let me try and set out the logic again a bit more clearly:

vega / vega-lite is a system for specifying visualizations
to create visualizations you normally have (at least) the following 3 components
- general-metadata - e.g. title of graph, credits …
- data
- graph description / specification
data itself often means 2 different things in most implementations:
- "raw-data" spec: a spec / description of data exactly in the form needed by the visualization system. This is often a very well defined spec e.g. an array of series …
- data-import + transform: this is very common. For the convenience of users one supports importing - and transforming - data from external or other sources e.g. a CSV, or even a JSON array (which is not yet in the "raw-data" form)
As is common, in vega / vega-lite you've ended up writing your own specs for all 3 items including your own mini-data spec (and import/transform spec) -- even though vega / vega-lite is really about the visualization site - the "graph description / specification"
This means you take up time working out your data work and fixing your import and transform sytem - as evidenced by this very issue or the discussion about units in #817 or several other issues.
What I am suggesting:
- focus vega-lite purely on the "graph description / specification"
- define a simple raw-data spec sufficient for vega-lite needs - this is the crucial "interface" between vega-lite and any external data sources. If you can get your data into this "raw-data" form then vega-lite can use it
- "reuse" for your more general data spec (the import / transform / import) to JSON Table Schema and the resource definition in Data Package.
- And we together focus on getting data from JSON Table Schema / Resource into the "raw-data" form that vega-lite needs

In summary, I'm suggesting a "separation of concerns". Rather than vega-lite inventing its own new data definition system including referencing external sources, having units, setting data types etc etc you reuse the existing JSON Table Schema and related specifications.

In addition, this makes vega-lite much more usable by third-parties, for example Data Packages and their tooling. I really want to use vega-lite or something like it for Data Packages. And I want to drop it straight into the datapackage.json views objects e.g.as per https://github.com/dataprotocols/dataprotocols/issues/77

{
  "name": "my-data-package",
  "resources": [
    {
      "name": "my-csv.csv",
      "schema": [
        "fields": [ { "name": "column1", "type": number"}, ....]
    }
  ]
  "views": [
    {
      "title": "...",
      "data": { "resource": ..., "query": .... },
      # VEGA-LITE stuff goes here specifying the view using the resource data ...
    }
  ]
}

@domoritz follow up Feb 4

...

{
  "name": "my-data-package",
  "resources": [
    {
      "name": "my-csv.csv",
      "schema": [
        "fields": [ { "name": "column1", "type": number"}, ....]
    }
  ]
  "views": [
    {
      "title": "...",
      "data": { "resource": ..., "query": .... },
      "spec": {
        "mark": "point",
        "encoding": {
          "x": {"field": "Horsepower", "type": "quantitative"}
        }
      }
    }
  ]
}

Note that I left out the at a from the Vega-Lite spec. Then the library for data packages could then run the following (pseudo) code.

loader = require("data-package-loader");
vl = require("vega-lite");
vg = require("vega");

package.views.forEach((view)=> {
  data = loader.downloadAndParse(view.data);
  vgSpec = vl.compile(vlSpec).spec;
  vg.parse.spec(vgSpec, function(chart) {
    chart({el:"#vis", data: data}).update();
  });
});

Misc other

Vega expects data objects like: `

[{foo: 1, bar: 3}, {foo: 4, bar: 2}, {foo: 4, bar: 6}, {foo: 2, bar: 7}]

rufuspollock commented 8 years ago

Update: an in progress test implementation using vega and vega-lite is going on here https://github.com/okfn/datapackage-render-js

Overview and discussion at: https://discuss.okfn.org/t/data-packages-views-graphs-maps-tables-etc/2667

domoritz commented 8 years ago

Thanks for the update. Also, check out https://github.com/vega/vega-embed. We use it whenever we want to render vega/ vega-lite charts on the web.

rufuspollock commented 8 years ago

@domoritz thanks and already planning to use that. I've also posted an overview of what we're trying to do here which you may find interesting: https://discuss.okfn.org/t/data-packages-views-graphs-maps-tables-etc/2667

domoritz commented 7 years ago

Just a few updates. We decided to drop vega-embed with Vega 3 and Vega-Lite 2. Vega 3 will be released very soon. Vega-Lite 2 will be released early next year.

rufuspollock commented 7 years ago

4 Dec 2016: I and @domoritz have had a couple of big meetings here and have reached a pretty clear understand of how we can integrate. More notes in the working group meetings doc

rufuspollock commented 7 years ago

UPDATE: I've spent long hours on this over the last few months and now have a draft spec here https://hackmd.io/s/SyTcmXPwl

domoritz commented 7 years ago

Awesome! I'll read it over the weekend. Should I write my feedback here in this issue?

rgbkrk commented 7 years ago

Thanks for keeping this updated, interested to see where things go so we can make use of them.

rufuspollock commented 7 years ago

@domoritz feedback here - or you can "fork" the doc and add comments.

domoritz commented 7 years ago

One concern is that you used Vega 2 and vega-Lite 1 and there is a major change in Vega 3, which is that you can have nested data sources. That means data does not have to be a top level property. I don't think it changes anything about your proposal but something to keep in mind. Vega 3 and Vega-Lite 2 will be released in April.

Pie pie – we are considering excluding pie charts as they are not widely used, often poor information design

I agree.

{
 "type": "line",
 "group": "x",
 "series": [ "y", "z" ]
}

What does group mean here? The x axis?

What we don’t like: having to tell it explicitly what the types are - can we infer that?

Yes, you can often infer it from the data types (integer, string, ...) but it is not possible in all cases.

data is an object not an array – only one data source allowed

Vega-Lite 2 supports named data sources without values.

What's your plan for supporting transformations? What will a spec look like? I suppose you can always do some of the transformations in Vega, right?

rufuspollock commented 7 years ago

@domoritz really useful.

Vega-Lite 2 supports named data sources without values.

Great

What's your plan for supporting transformations? What will a spec look like? I suppose you can always do some of the transformations in Vega, right?

We would like to reuse transforms from somewhere else e.g. vega 😄 - i've just opened a question about this on vega-dataflow - https://github.com/vega/vega-dataflow/issues/17

Our analysis so far to understand vega transforms is here: https://hackmd.io/JwEwjGDMCGBMsFoBsAWAHJBLhmggRgAzCbDDRkgCsAZgKaw35A==?both#appendix-data-transform-research

rufuspollock commented 7 years ago

First draft of a views spec is out:

http://specs.frictionlessdata.io/views/

Especially recommend reading: http://specs.frictionlessdata.io/views/#concepts-and-background

Current text is unpolished but underlying model is quite robust and has been actively used and tested for the last 9-12m and is in active use in production in https://datahub.io/

Stephen-Gates commented 6 years ago

Given https://frictionlessdata.io/specs/views/ and https://github.com/frictionlessdata/specs/issues/255 can this be closed?

domoritz commented 6 years ago

I think you guys will like https://github.com/vega/vega-lite/pull/3417. It makes injecting data into a Vega-Lite spec quite easy.

rufuspollock commented 6 years ago

@domoritz that's great! Have you read the draft spec at our end: http://specs.frictionlessdata.io/views/

Do you have any thoughts?

domoritz commented 6 years ago

Looks good to me. How are datasets matched to resources in Vega, though. In Vega, a dataset can be at any level in the specification and it doesn't necessarily have to be at the top. As a solution suggest that you require named datasets in the Vega and Vega-Lite specs where the names correspond to the names of the resources.

domoritz commented 6 years ago

Here are my proposed updates: #600

rufuspollock commented 6 years ago

@domoritz awesome - and we have a bunch of vega-based demos on DataHub here:

http://datahub.io/examples

(We'd love to add some vega-lite ones - we keep them here https://github.com/datapackage-examples)

domoritz commented 6 years ago

Is there anything left in this issue? I see that data transformations are not in the spec yet for example. Anything else? Would it make sense to move this to a separate issue?

rufuspollock commented 5 years ago

@domoritz i think we can indeed close and move things like transformations to a new, separate issue if needed.

rufuspollock commented 5 years ago

FIXED. See https://frictionlessdata.io/specs/views/

frictionlessdata / datapackage