geojson / geojson-ld

GeoJSON in JSON-LD contexts
Other
130 stars 13 forks source link

Can we make an example showing GeoJSON-LD adding geo detail to existing JSON-LD data? #28

Closed danbri closed 9 years ago

danbri commented 9 years ago

I love the mission of "It offers a smooth path for upgrading existing GeoJSON data.", but what about flipping that around and also showing how this work could add richer geo detail to existing JSON-LD data?

(Said with a schema.org hat on, since we're encouraging JSON-LD usage, and http://schema.org/Place http://schema.org/GeoShape etc lack some detail. See https://github.com/rvguha/schemaorg/ for details).

sgillies commented 9 years ago

@danbri Sure! I haven't thought about that aspect of it very much, and don't quite know where to start. As I better understand JSON-LD's processing model, I realize that its inability to deal with lists of lists makes getting from RDF representations of spatial things to GeoJSON via the JSON-LD API and contexts very difficult. It could be the same for schema.org objects?

cc @ManoMarks

akuckartz commented 9 years ago

@danbri Have a look at the comments in https://github.com/geojson/geojson-ld/issues/12

lanthaler commented 9 years ago

AFAICT, there a two approaches to work around the lack of support for lists of lists but unfortunately neither of them is compatible with traditional GeoJSON:

  1. use string values instead of lists (similar to WKT)
  2. convert the inner lists to resources/objects

Taking the following snippet as example:

"coordinates": [
  [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
]

Approach 1 would look as follows

"coordinates": [
  "102.0 0.0", "103.0 1.0", "104.0 0.0", "105.0, 1.0"
]

or even

"coordinates": "102.0 0.0, 103.0 1.0, 104.0 0.0, 105.0 1.0"

Approach 2:

"coordinates": [
  { "x": 102.0, "y": 0.0 }, { "x": 103.0, "y": 1.0 }, { "x": 104.0, "y": 0.0 }, { "x": 105.0, "y": 1.0 }
]
akuckartz commented 9 years ago

I do not see a good reason to use a different stringification(?) than WKT. Open Source code to convert between WKT and GeoJSON is available. BTW: Both OpenGovLD and OParl use WKT.

danbri commented 9 years ago

FWIW there is currently a rather stringy model in schema.org, e.g. http://schema.org/polygon

"A polygon is the area enclosed by a point-to-point path for which the starting and ending points are the same. A polygon is expressed as a series of four or more space delimited points where the first and final points are identical."

I believe the heritage is GeoRSS -> rNews -> schema.org, with some lossy changes along the route.

Related Geowankers thread: https://www.mail-archive.com/geowanking@geowanking.org/msg02000.html

For schema.org we prefer there to be a reasonable 'triples' view of things, which makes lists of lists awkward. So hiding things within microsyntax inside a string might be more sensible that I originally thought.

akuckartz commented 9 years ago

A number of SPARQL engines are supporting WKT, among them the Open Source engines Apache Jena, OpenLink Virtuoso and Parliament.

danbri commented 9 years ago

OK, having read around here a bit, talked with Mano Marks, and looked into our constraints at schema.org, I'm inclined to agree with akuckartz's comment in https://github.com/geojson/geojson-ld/issues/12 that "using WKT currently is the best way to be compatible with the different worlds".

Having said that, looking at https://en.wikipedia.org/wiki/Well-known_text I get that heart-sinking feeling at the prospect of chasing off looking for some giant offline ISO spec. It would need to be described in a self-contained way, ideally with simple code fragments for .js/.py/.java.

Another argument by analogy for a "packed in a string" notation that developers get along OK with: SVG paths. http://www.w3.org/TR/SVG/paths.html https://www.dashingd3js.com/svg-paths-and-d3js

jyutzler commented 9 years ago

The spec for WKT can be downloaded here: http://www.opengeospatial.org/standards/sfa It is presented in BNF in chapter 7.

ManoMarks commented 9 years ago

@akuckartz Can you point to some good libraries for converting GeoJSON to WKT and back? I think that would be really useful to presenting working code.

peterisb commented 9 years ago

@ManoMarks gdal/ogr here: http://pcjericks.github.io/py-gdalogr-cookbook/ are python api examples. Using ogr2ogr comandline tool to get WKT from json: ogr2ogr -f CSV my_geojson_data.csv my_geojson_data.json -lco "GEOMETRY=AS_WKT"

ManoMarks commented 9 years ago

@peterisb Cool. Anything in JavaScript?

danbri commented 9 years ago

I don't know how good/bulky/etc http://terraformer.io/wkt-parser/ is, but being Bower-packaged is at least Web Components / Polymer friendly (for those who care).

sgillies commented 9 years ago

@ManoMarks @peterisb https://github.com/geomet/geomet is IMO the way to convert between WKB/WKT and GeoJSON for Python.

Everybody, is there anything actionable for GeoJSON-LD here? Changes for the contexts, vocab, or examples?

peterisb commented 9 years ago

@sgillies thanks, looks interesting.

@ManoMarks in javascript you can use OpenLayers lib. Here: http://openlayers.org/dev/examples/vector-formats.html . There is alos the new implementation ol3js.

danbri commented 9 years ago

re actionability, ... is there rough consensus towards a WKT-based approach? if so, then I guess yes!

akuckartz commented 9 years ago

And another WKT/GeoJSON JavaScript library (GPLv3): http://arthur-e.github.io/Wicket/

That page contains an interactive example.

akuckartz commented 9 years ago

And for completeness this WKT/GeoJSON JavaScript library: https://github.com/mapbox/wellknown

sgillies commented 9 years ago

WKT in JSON looks and feels wrong to me.

Have any of you used jsonld.js to expand GeoJSON? Here's a link to the JSON-LD playground showing the expansion of the coordinates of a triangular GeoJSON ring: http://goo.gl/92XA2f. Jsonld.js flattens the coordinates to an array of value objects which is more or less equivalent to an array of numbers (check out the "compacted" form). This kind of flattened could be used instead of WKT, no? With the advantage that JSON linters are still useful on arrays of numbers. In other words, something like

"shape": { "type": "LineString", "xy": [0, 0, 1, 1]}

looks and feels better to me than

"shape": "LINESTRING(0 0, 1 1)"

Of course, there would be some details to work out like dimensionality and how to represent polygons. Following the lead of WKB might be a good start.

ManoMarks commented 9 years ago

The advantage of WKT in JSON to me is that it matches what people would be doing in PostGIS or any of the other databases to create spatial queries, etc. However, the flattened more JSONic approach is useful too. I think something more like this might be useful:

"shape": { "type": "LineString", "xy": [0 0, 1 1]}

Or we specify a limited vocabulary of properties such that we have either an "xy" property or an "xyz" property so you have:

"shape": { "type": "LineString", "xy": [0,0, 1,1]} or "shape": { "type": "LineString", "xyz": [0,0,0,1,1,0]}

danbri commented 9 years ago

Trying to find the simplest toy example of this style, to test w.r.t. schema.org (ignore property name and types for now):

{ "@context": "http://schema.org/", "name": "The Empire State Building", "description": "The Empire State Building is a 102-story landmark in New York City.", "image": "http://www.civil.usherbrooke.ca/cours/gci215a/empire-state-building.jpg", "geo": { "@list": [0, 0, 1, 1, 1234, 4323] } }

This parses out at http://json-ld.org/playground/ into the following graph/triples:

:b0 http://schema.org/description "The Empire State Building is a 102-story landmark in New York City." . :b0 http://schema.org/geo :b1 . :b0 http://schema.org/image http://www.civil.usherbrooke.ca/cours/gci215a/empire-state-building.jpg . :b0 http://schema.org/name "The Empire State Building" . :b1 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "0"^^http://www.w3.org/2001/XMLSchema#integer . :b1 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :b2 . :b2 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "0"^^http://www.w3.org/2001/XMLSchema#integer . :b2 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :b3 . :b3 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "1"^^http://www.w3.org/2001/XMLSchema#integer . :b3 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :b4 . :b4 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "1"^^http://www.w3.org/2001/XMLSchema#integer . :b4 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :b5 . :b5 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "1234"^^http://www.w3.org/2001/XMLSchema#integer . :b5 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest :b6 . :b6 http://www.w3.org/1999/02/22-rdf-syntax-ns#first "4323"^^http://www.w3.org/2001/XMLSchema#integer . :b6 http://www.w3.org/1999/02/22-rdf-syntax-ns#rest http://www.w3.org/1999/02/22-rdf-syntax-ns#nil .

... not pretty but at least the list is recoverable.

lanthaler commented 9 years ago

Yep, it is. But isn’t it ambiguous what this list means without additional information? Is it xy xy xy or xyz xyz?

ManoMarks commented 9 years ago

On the other hand, this: { "@context": "http://schema.org/", "name": "The Empire State Building", "description": "The Empire State Building is a 102-story landmark in New York City.", "image": "http://www.civil.usherbrooke.ca/cours/gci215a/empire-state-building.jpg", "geo": { "type": "LineString", "xy": ["0 1","1 2"]} } parses to: :c14n0 http://schema.org/description "The Empire State Building is a 102-story landmark in New York City." . :c14n0 http://schema.org/geo :c14n1 . :c14n0 http://schema.org/image http://www.civil.usherbrooke.ca/cours/gci215a/empire-state-building.jpg . :c14n0 http://schema.org/name "The Empire State Building" . :c14n1 http://schema.org/type "LineString" . :c14n1 http://schema.org/xy "0 1" . :c14n1 http://schema.org/xy "1 2" .

But it does retain the ugly space notation

danbri commented 9 years ago

Also the triples are unordered, so we can't tell:

A)

:c14n1 http://schema.org/xy "0 1" . :c14n1 http://schema.org/xy "1 2" .

from

B)

:c14n1 http://schema.org/xy "1 2" . :c14n1 http://schema.org/xy "0 1" .

ManoMarks commented 9 years ago

Ugh. That doesn't work at all then. Is there an alternative to this and WKT.

danbri commented 9 years ago

Going back to the earlier ...

"shape": { "type": "LineString", "xy": [0, 0, 1, 1]} looks and feels better to me than "shape": "LINESTRING(0 0, 1 1)"

How about

"shape": { "type": "LineString", "xy": "0 0, 1 1" }

This is a little more JSON-friendly. It preserves order. It is immediately expressible in Microdata and RDFa too.

Here's a schema.org -based example that sketches the extended use of http://schema.org/serviceArea for this stuff, to give an idea:

Microdata:

<div itemscope itemtype="http://schema.org/Locksmith&quot;&gt; <div itemprop="serviceArea" itemscope itemtype="http://schema.org/LineString&quot;&gt; <span itemprop="xy" content="0 0, 1 1"/> </div> </div>

RDFa 1.1:

<div vocab="http://schema.org/&quot; typeof="Locksmith"> <div property="serviceArea" typeof="LineString"> <span property="xy" content="0 0, 1 1"/> </div> </div>

Since these flavours of schema.org are on 6M+ sites I lean towards an option that is bearable in both markup and JSON, even while acknowledging it's not as intuitive for Javascript people as an array based structure. For that we already have classic geojson. While there's a cost to deviating from it, there also needs to be some benefit for "going json-ld", and many of those benefits are around interop with other datasets, tools, vocabularies that work over the (sometimes awkward) triple abstraction. If we can find decent code snippets to convert, then it feels bearable. Not a trivial cost but still bearable...

akuckartz commented 9 years ago

I prefer "LINESTRING(0 0, 1 1)" to something which somehow looks like JSON but still requires an additional parser for an additional format (for 0 0, 1 1 and similar strings).

danbri commented 9 years ago

@akuckartz ... even though you can compose it from wrapping "LINESTRING(" and " )" around it?

What proportion of readers are going to see this and go "ah, that's WKT notation!".

Another version of the question: if a single complex string is to be the property of something in the JSON-LD model, that entity will need a type. What types should be used there? Geometrically oriented, or real-world entity oriented? e.g. is it a Locksmith that has an xy property, or a LineString?

akuckartz commented 9 years ago

@danbri 0 0, 1 1 is not usable as such. It is neither WKT nor JSON, that is the problem I see.

What about WKT strings like this: GEOMETRYCOLLECTION(POINT(4 6),LINESTRING(4 6,7 10))? Can they be expressed? Should the use of such complex geo-data be avoided or enabled?

I am not sure that I understand the last questions, but

"shape": "LINESTRING(0 0, 1 1)"

can be a geometrically oriented property of a Locksmith.

danbri commented 9 years ago

Ah, yes you're right - if we want to have complex geometry then raw WKT would probably be preferable. I'm not familiar enough with how complex they get...

Re "Locksmith", the issue is whether we want to allow different named relationships between the geometry and the real world entity. The linestring could outline the shape of their office or their service zone; just using a "shape" relation leaves that unclear. The shape of a https://schema.org/Museum or http://schema.org/Zoo or http://schema.org/Beach might be the place itself, whereas the various types of http://schema.org/EducationalOrganization e.g. http://schema.org/School could want to represent their "catchment area" (alongside the shape of the school grounds, perhaps).

For another (slightly weak) example perhaps a http://schema.org/Taxi has picking up vs dropping off zones, or https://schema.org/RealEstateAgent has different rentalZone vs salesZone.

akuckartz commented 9 years ago

I agree that different named relationships between geometries and real world entities very often make sense (as illustrated by those examples).

GeoJSON seems to miss the 3D object types provided by WKT. Otherwise the complexity of WKT is the same as that of GeoJSON and GeoJSON-LD: http://en.wikipedia.org/wiki/Well-known_text#Geometric_objects

If no 3D object types are used It is possible to transform WKT to GeoJSON(-LD) and vice versa. The question regarding complexity therefore is a general one. For example: Do we want GeometryCollections? http://geojson.org/geojson-spec.html#geometrycollection

(They can be recursive...)

danbri commented 9 years ago

I'm not aware of immediate 3d use cases at least around schema.org and JSON-LD. I'm sure it'll come eventually, but maybe premature optimization to worry too much at this stage?

akuckartz commented 9 years ago

Complete support for WRT would automatically include support for 3D. But I do not have a strong position regarding 3D here. In any case it does not add much to the complexity already involved in 2D WRT or GeoJSON.

The main question at the moment seems to be: Is the complexity of (complete) 2D WRT or GeoJSON welcome to schema.org. The answer to that question probably influences if WRT or GeoJSON(-LD) is more suitable.

danbri commented 9 years ago

If there's 3D WRT data out there, having it accessible in a JSON-LD and schema.org -friendly form seems fine and useful. There are lots of places within schema.org where in theory someone could stuff incredibly detailed, complex data, simply by using existing vocabulary more erm wholeheartedly. For schema.org that pretty much comes with the territory since it's essentially a dictionary, where both publishers and consumers get to pick and choose the bits that are useful to them (hopefully with some overlaps). So I'm left leaning towards WRT...

msporny commented 9 years ago

Why not just dump the entire two-dimensional array in a JSON string? I know that's not pure, but the assumption here is you already have access to a JSON processor (every major programming language does).

So, do this:

"coordinates": "[[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]]"

give the "coordinates" a type of "jsonld:JsonData", and that solves the problem, right? This is what the RDF becomes:

<> geo:coordinates "[[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]]"^^jsonld:JsonData .

It doesn't map to pure RDF, but who cares... neither does base64 and people use base64 encodings in RDF for binary data. Recovering the coordinates is as easy as doing this:

var obj = getGeoJsonObject();
obj.coordinates = JSON.parse(obj.coordinates);

As an added bonus, we could tie it into JSON-LDs type coercion system so that it's all encapsulated in the JSON-LD Context. So, you'd do this in the JSON-LD Context:

"@context": {
  "coordinates": {
    "@id": "http://www.w3.org/2003/01/geo/wgs84_pos#coordinates",
    "@type": "jsonld:JsonData"
  }  
}

We could even consider adding this feature to all JSON processors, it'd only take a few lines of code in each processor and I'm pretty sure we could get all major implementations implementing it in a few weeks time. That would mean that in JSON-LD compact form, GeoJSON looks exactly like you'd expect it to:

"coordinates": [[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]]

and you can round-trip it to/from RDF losslessly.

Thoughts @dlongley @lanthaler @gkellogg @danbri @jasnell?

Once concern is digitally signing information like this (as you'd need a canonical form for JSON, but maybe that's just "remove all whitespace, preserve the order of properties in the string data".

jasnell commented 9 years ago

Using an alternative such as GeoSparql's Well-Known Text Literal is, by far, a better choice.

sgillies commented 9 years ago

@msporny I like that approach a lot. It addresses the root problem – lack of support for lists in JSON-LD, not lack of support for GIS data in JSON-LD – in a general way and looks like an attractive thing to implement.

msporny commented 9 years ago

@jasnell I thought the problem w/ "POLYGON((-77.089005 38.913574, -77.029953 38.913574, -77.029953 38.886321))"^^geo:wktLiteral was that it wasn't idiomatic JSON? You can't just push that through a JSON processor and hope for something good to come out of it. Or do I not understand what "GeoSPARQL's Well-Known Text Literal" is? Why is GeoSPARQLs text literal a better choice? I thought @dret 's whole issue with JSON-LD was that you can't continue to easily and losslessly convert idiomatic JSON data structures, that don't really map well to RDF, without pre-processing the input data.

@sgillies Keep in mind that JSON-LD (really RDF) has support for lists. It just doesn't have support for list-of-lists because there is no good way (that's simple to implement) to roundtrip that through RDF at the moment. The root problem that this solves is the preservation of JSON data structures that don't map well to RDF through the RDF round-tripping process (which is kinda what you said, but I'm being pedantic to make sure everyone is on the same page wrt. the problem and solution).

dr-shorthair commented 9 years ago

Yes - recognizing the boundary between what JSON (or RDF) can handle natively, and types such as time and geometry, is important. Time is a universally recognized structured type, with 7 elements encoded using a standard syntax following ISO 8601. Geometry is a similar issue. That's exactly why GeoSPARQL gives up on RDF as the encoding when it gets to geometry, and goes with WKT (or GML). If coordinate lists are too much of a stretch to do directly in JSON, then treat it as a datatype and use an alternative encoding (preferably a pre-existing one for which there are already parsers).

On 24 November 2014 at 07:51, Sean Gillies notifications@github.com wrote:

@msporny https://github.com/msporny I like that approach a lot. It addresses the root problem – lack of support for lists in JSON-LD, not lack of support for GIS data in JSON-LD – in a general way and looks like an attractive thing to implement.

— Reply to this email directly or view it on GitHub https://github.com/geojson/geojson-ld/issues/28#issuecomment-64134062.

gkellogg commented 9 years ago

@sgilles, not sure why you say json-ld doesn't support lists, it certainly does using @list and compacted forms.

I do think using a literal as a string encoding of json types makes sense in this case.

jasnell commented 9 years ago

String encoding a JSON array doesn't really seem to make a lot of sense where there are existing tools that already support the wktLiteral option. You just end up duplicating effort unnecessarily. The data doesn't have to be idiomatic JSON. We already deal with other data types that are not idiomatic JSON (URI's, media types, dateTime, etc)

lanthaler commented 9 years ago

+1

dret commented 9 years ago

hello simon.

On 2014-11-23, 22:39, Simon Cox wrote:

If coordinate lists are too much of a stretch to do directly in JSON, then treat it as a datatype and use an alternative encoding (preferably a pre-existing one for which there are already parsers).

i guess this is where we really should be precise. coordinate lists really are no problem at all for JSON, which is why GeoJSON works just fine. and because JSON can handle these lists just fine, we have a nicely extensible model which allows GeoJSON to also carry more than just the 2 or 3 values that go into the lists (https://github.com/geojson/draft-geojson/issues/56).

it's just RDF that's bad at lists, and before we jump to the conclusion that we want to change a JSON format because of an RDF weakness, we should look at all the uses of the JSON format (including those that take advantage of the extensibility), and see how they would be affected by making the JSON more RDF-friendly.

dlongley commented 9 years ago

My only problem with the idea suggested here by @msporny is that it doesn't appear to solve another related problem w/JSON-LD: deep-linking. The deep-linking problem is one where there exists Linked Data in a JSON-LD document, but you have to traverse non-Linked Data JSON to get to where it exists in the document. I mention this because I think that a solution to the deep-linking problem may also solve the same problem here -- so it would be best to have a single unified solution for both use cases.

A solution for the deep-linking problem would likely put a JSON-LD processor into a state that accepts non-Linked Data while it looks for Linked Data in deeper sections of a JSON tree and then processes that data. The use case here would just fall under a special situation where no Linked Data is found. The end result would be to preserve all the non-Linked Data JSON -- which is what is desired here.

sgillies commented 9 years ago

@gkellogg @msporny I misspoke and appreciate the correction.

dr-shorthair commented 9 years ago

Right. Lists are a real dog in RDF, not a problem in JSON. Not that JSON is perfect - see NaN, +INF, -INF

sgillies commented 9 years ago

Closing. Future action at #31.