OpenTransport / Stations

A knowledge center for transport data
http://stations.io
MIT License
15 stars 3 forks source link

File format : CSV, json, other? #3

Closed Tristramg closed 11 years ago

Tristramg commented 11 years ago

I like the idea of having a plain text format that is stored on github.

CSV would be the first choice : easy to open in as a tablesheet, it is the GTFS format, the NaPTAN format…

However, at some point we will have lots of unstructured data : aliases, translations, key-value properties that might not fit into the CSV format. OSM works so well because they are very open about all the key-value properties.

Hence, maybe a json format would be more appropriate. What do you think?

rufuspollock commented 11 years ago

I'm a non-expert here but I'm a big fan of CSV and would suggest making this into a Simple Data Format Data Package (CSV + a descriptor / schema as some simple JSON)

pietercolpaert commented 11 years ago

I'm all in favor of doing it in the rdf/turtle format.

After all, rdf is more web native (;)), and since we already have a vocabulary, we can easily create an ontology. We can then provide the right conversion tools for people who would rather edit it as CSV or people who rather edit it as GeoJSON to do so :)

Tristramg commented 11 years ago

I have no opinion here. I tried (a few seconds, not a serious effort) to look at http://www.w3.org/TeamSubmission/turtle/, and I’m not sure how the data will be.

Something I wish was more common in Open Data, is that the data can be read by anyone, not only developers.

But then again, with conversion tools, everything is possible, and we need a pivot format ;)

rufuspollock commented 11 years ago

Could someone put an outline of the likely data as a set of columns or as JSON-like (just want an idea of main set of attributes)

pietercolpaert commented 11 years ago

The most likely for data for public transport will be delivered in CSV as GTFS or GTFS like formats are pretty common for transport stops.

For other stop_points such as parking lots, rent a bike spots and such, geojson and XML will be pretty common.

I would like to take withdraw my comment about using rdf/turtle and I think CSV and using the SDF and SDP as @rgrp indicated is great. From the CSV's (which should have column names which conform to the opentransport/vocabulary) we can then generate GeoJSON and rdf/turtle.

What do you think?

pietercolpaert commented 11 years ago

@rgrp - does SDF already work with linked CSV?

rufuspollock commented 11 years ago

@pietercolpaert not really sure - i imagine they are orthogonal. Right now just getting the sense of the data structure is probably best place to start.

Also you guys should call this as you're likely to drive - my call for csv was just a 2c thing ;-)

rufuspollock commented 11 years ago

Sounds great here - re CSV and then generating stuff.

Tristramg commented 11 years ago

Let’s go for CSV, especially now that github has a visualisation tool !