Closed rufuspollock closed 5 years ago
/cc @vitorbaptista @kindly interested in your thoughts here
+1 for promoting vega in this context. I think it makes a lot of sense.
Are views always going to be local to the package? Should they be treated like resources? Should they have either path, url or content fields to specify what they are? I feel like views make a lot more sense if they're treated kind of like resources in this sense.
@besquared at present it was assumed that views were always local to the data package. Why treat them as resources? "resources" is really short for "data resources" (i know the term resources is very generic ...). What would the path/url/content fields point to?
Add some more detail and an example of the JSON
In the interactive data lab (Jeffrey Heer's group, who created vega), we are working on an abstraction on top of vega which we will release as open source.
The specification will look something like
{
"marktype": "point",
"enc": {
"x": {"name": "yield","type": "Q","bin": true},
"y": {"name": "variety","type": "O"},
"size": {"name": "*","type": "Q","aggr": "count"},
"color": {"name": "year","type": "O"}
},
"cfg": {"dataUrl": "data/barley.json"}
}
This specification will be translated to a vega specification similar to http://trifacta.github.io/vega/editor/?spec=barley. I think this could be the right level of abstraction for views specifications.
As I said, this is work in progress and the main reason it's not open source yet is that we need to finish a few features, clean up the code and write documentation.
We just open sourced https://github.com/uwdata/vegalite. It's a higher level grammar for visualizations and we have a json schema specification. However, although we just made it open source, we are still working on getting the documentation out.
@domoritz fantastic - will start looking in more detail.
@domoritz any further updates here - this is something I'm increasingly interested in. Would really like to get this into Data Packages as an official spec in some way.
I also wonder if vega-lite could leverage JSON Table Schema and/or Tabular Data Package for its data field stuff?
@kiliakis can I suggest you take a look at vega-lite. I think we may want to start experimenting with this in our views ;-)
@rgrp Vega-Lite is getting more and more traction and there is active support for vega-lite. We are planning to release vl 1.0 early next year but the current master is almost the final version.
Concerning your question about JSON table schema/ Tabular data package. Right now vl requires that you specify the data type (quantitative, ordinal, nominal, temporal and later geo) but we plan to remove that requirement in some cases (https://github.com/vega/vega-lite/issues/512). So having some schema information is definitely helpful. However, a few weeks ago we removed the need for explicit statistics (min, max, cardinality, ...) and compute these in vega if needed.
I'd say that vl is ready to be used to package visualizations. If you come across any problems, let me know.
@domoritz really useful.
On the vega-lite data source re-using some of Data Package / JSON Table Schema I've just posted the following in https://github.com/vega/vega-lite/issues/512
@domoritz would it be possible to be using JSON Table Schema stuff here -- what's missing from JSON Table Schema? I guess my bigger question is whether you could replace your data source object with a Data Package resource object?
I ask because this is naturally what I would want to do in: https://github.com/dataprotocols/dataprotocols/issues/77 -- basically I want data specified by Data Package then connected with visualization specified by vega-lite. At the moment, obviously you end up defining your own data source mini-language and I wonder if we could converge there replacing vega-lite's data source with a subpart of Data Package spec and Data Package using vega-lite for view specs.
Excerpting key stuff from that vega-lite thread here:
https://github.com/vega/vega-lite/issues/512#issuecomment-179735332
@domoritz let me try and set out the logic again a bit more clearly:
In summary, I'm suggesting a "separation of concerns". Rather than vega-lite inventing its own new data definition system including referencing external sources, having units, setting data types etc etc you reuse the existing JSON Table Schema and related specifications.
In addition, this makes vega-lite much more usable by third-parties, for example Data Packages and their tooling. I really want to use vega-lite or something like it for Data Packages. And I want to drop it straight into the datapackage.json views
objects e.g.as per https://github.com/dataprotocols/dataprotocols/issues/77
{
"name": "my-data-package",
"resources": [
{
"name": "my-csv.csv",
"schema": [
"fields": [ { "name": "column1", "type": number"}, ....]
}
]
"views": [
{
"title": "...",
"data": { "resource": ..., "query": .... },
# VEGA-LITE stuff goes here specifying the view using the resource data ...
}
]
}
...
{
"name": "my-data-package",
"resources": [
{
"name": "my-csv.csv",
"schema": [
"fields": [ { "name": "column1", "type": number"}, ....]
}
]
"views": [
{
"title": "...",
"data": { "resource": ..., "query": .... },
"spec": {
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"}
}
}
}
]
}
Note that I left out the at a from the Vega-Lite spec. Then the library for data packages could then run the following (pseudo) code.
loader = require("data-package-loader");
vl = require("vega-lite");
vg = require("vega");
package.views.forEach((view)=> {
data = loader.downloadAndParse(view.data);
vgSpec = vl.compile(vlSpec).spec;
vg.parse.spec(vgSpec, function(chart) {
chart({el:"#vis", data: data}).update();
});
});
Vega expects data objects like: `
[{foo: 1, bar: 3}, {foo: 4, bar: 2}, {foo: 4, bar: 6}, {foo: 2, bar: 7}]
Update: an in progress test implementation using vega and vega-lite is going on here https://github.com/okfn/datapackage-render-js
Overview and discussion at: https://discuss.okfn.org/t/data-packages-views-graphs-maps-tables-etc/2667
Thanks for the update. Also, check out https://github.com/vega/vega-embed. We use it whenever we want to render vega/ vega-lite charts on the web.
@domoritz thanks and already planning to use that. I've also posted an overview of what we're trying to do here which you may find interesting: https://discuss.okfn.org/t/data-packages-views-graphs-maps-tables-etc/2667
Just a few updates. We decided to drop vega-embed with Vega 3 and Vega-Lite 2. Vega 3 will be released very soon. Vega-Lite 2 will be released early next year.
4 Dec 2016: I and @domoritz have had a couple of big meetings here and have reached a pretty clear understand of how we can integrate. More notes in the working group meetings doc
UPDATE: I've spent long hours on this over the last few months and now have a draft spec here https://hackmd.io/s/SyTcmXPwl
Awesome! I'll read it over the weekend. Should I write my feedback here in this issue?
Thanks for keeping this updated, interested to see where things go so we can make use of them.
@domoritz feedback here - or you can "fork" the doc and add comments.
One concern is that you used Vega 2 and vega-Lite 1 and there is a major change in Vega 3, which is that you can have nested data sources. That means data
does not have to be a top level property. I don't think it changes anything about your proposal but something to keep in mind. Vega 3 and Vega-Lite 2 will be released in April.
Pie pie – we are considering excluding pie charts as they are not widely used, often poor information design
I agree.
{ "type": "line", "group": "x", "series": [ "y", "z" ] }
What does group mean here? The x axis?
What we don’t like: having to tell it explicitly what the types are - can we infer that?
Yes, you can often infer it from the data types (integer, string, ...) but it is not possible in all cases.
data is an object not an array – only one data source allowed
Vega-Lite 2 supports named data sources without values.
What's your plan for supporting transformations? What will a spec look like? I suppose you can always do some of the transformations in Vega, right?
@domoritz really useful.
Vega-Lite 2 supports named data sources without values.
Great
What's your plan for supporting transformations? What will a spec look like? I suppose you can always do some of the transformations in Vega, right?
We would like to reuse transforms from somewhere else e.g. vega 😄 - i've just opened a question about this on vega-dataflow - https://github.com/vega/vega-dataflow/issues/17
Our analysis so far to understand vega transforms is here: https://hackmd.io/JwEwjGDMCGBMsFoBsAWAHJBLhmggRgAzCbDDRkgCsAZgKaw35A==?both#appendix-data-transform-research
First draft of a views spec is out:
http://specs.frictionlessdata.io/views/
Especially recommend reading: http://specs.frictionlessdata.io/views/#concepts-and-background
Current text is unpolished but underlying model is quite robust and has been actively used and tested for the last 9-12m and is in active use in production in https://datahub.io/
Given https://frictionlessdata.io/specs/views/ and https://github.com/frictionlessdata/specs/issues/255 can this be closed?
I think you guys will like https://github.com/vega/vega-lite/pull/3417. It makes injecting data into a Vega-Lite spec quite easy.
@domoritz that's great! Have you read the draft spec at our end: http://specs.frictionlessdata.io/views/
Do you have any thoughts?
Looks good to me. How are datasets matched to resources in Vega, though. In Vega, a dataset can be at any level in the specification and it doesn't necessarily have to be at the top. As a solution suggest that you require named datasets in the Vega and Vega-Lite specs where the names correspond to the names of the resources.
Here are my proposed updates: #600
@domoritz awesome - and we have a bunch of vega-based demos on DataHub here:
(We'd love to add some vega-lite ones - we keep them here https://github.com/datapackage-examples)
Is there anything left in this issue? I see that data transformations are not in the spec yet for example. Anything else? Would it make sense to move this to a separate issue?
@domoritz i think we can indeed close and move things like transformations to a new, separate issue if needed.
FIXED. See https://frictionlessdata.io/specs/views/
UPDATE August 2017
Draft spec at: http://specs.frictionlessdata.io/views/
UPDATE March 2017
v1.0 DRAFT SPEC AVAILABLE HERE: https://hackmd.io/s/SyTcmXPwl
OLD
What: a JSON spec for describing views of data like graphs, tables or maps. Probably focus this just on graphs to start with.
This spec is motivated by desire to include views support in to Data Packages. However, we aim to make this generic and reusable outside of Data Packages and we will avoid or flag anything Data Package specific.
This was motivated by experience with ReclineJS where we needed to serialize "view" configurations (e.g. graphs and maps) into JSON. I have been using this informally in data packages for a while - see e.g. https://github.com/datasets/house-prices-us/blob/master/datapackage.json#L157
Proposal
The datapackage.json MAY contain an attribute named
views
. The value ofviews
MUST be an array where each entry is a data "view" descriptor. Each dataview
descriptor MUST be a JSON object and MUST have atype
key which uniquely defines the type of view (and what other attributes should be present).Here's an example:
Planned
Whilst new tools can specify their own "view" information we provide define 2 specific types:
type: graph
Propose: we use Flot or Vega spec - http://trifacta.github.io/vega/
type: map
Propose: we run this off leaflet (??)
Research
type: vega
as a graph type in Data Views.