geopython / pygeoapi

pygeoapi is a Python server implementation of the OGC API suite of standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML. pygeoapi is open source and released under an MIT license.
https://pygeoapi.io
MIT License
459 stars 249 forks source link

Support custom JSON-LD context in generic JSON output #1678

Open rob-metalinkage opened 2 weeks ago

rob-metalinkage commented 2 weeks ago

Is your feature request related to a problem? Please describe. Documenting the meaning of feature attributes decreases ambiguity and increases interoperability. JSON-LD provides a means to do this with a reference to a context document, or an embedded context. The current JSON-LD option supports a different model - representing geometry elements in a (set of) particular structures, but does not address semantics of feature properties.

Describe the solution you'd like Allow definition of feature properties via published JSON-LD contexts - e.g. based on the featureType (for FG-JSON).

This would be implementable in incremental steps:

  1. allow specification of a URL to inject into an @context element in the JSON
  2. allow bundling of a context document to be served as a resource and referenced by URL as per case 1
  3. Import related documentation from an OGC BuildingBlock to inject into API description
  4. Exploitation of JSON-LD context to provide end-users with descriptions of feature properties from linked ontologies.

step 1 provides the key enabler and is a non-breaking change to JSON outputs. Other functionality addresses infrastructure design - e.g. the concept of Feature Type Catalogues and could be designed separately. Code sprints for these would test potential.

Describe alternatives you've considered Existing JSON-LD wrapper - but this changes the JSON which may cause problems and is both a less general solution and less expressive. Injecting context for the feature properties is a complementary solution, but one that can be safely applied to both the normal JSON and the extended JSON-LD feature model.

Additional context The OGC BuildingBlocks model provides for feature schema definitions with JSON-LD contexts ready to use (and independently testable). JSON-LD contexts for OGC API data schemas are provided, along with ontologies (definitions) of structural elements, so that only feature-specific properties need to be defined.

An example of how this can be leveraged using a domain model - by carefully bundling the Features schema with the relevant properties model is here: https://ogcincubator.github.io/bblocks-examples/bblock/ogc.bbr.examples.feature.externalSchema

ksonda commented 2 weeks ago

Worth checking if this capability was implemented in https://github.com/geopython/pygeoapi/pull/868

For example, https://reference.geoconnex.us/collections/gages/items/1000085?f=jsonld implements a complex RDF representation of feature properties through a combination of custom context and custom JSON via jinja2 templating.

jmckenna commented 2 weeks ago

For example, https://reference.geoconnex.us/collections/gages/items/1000085?f=jsonld

Thanks for sharing. For my quick review of the JSON-LD: I am surprised to see that "property":"value" works in the JSON-LD, as it is proper instead to have "property": "value"

ksonda commented 2 weeks ago

probably just an oversight on that jinja template, although yeah, looks like json-ld playground thinks its fine, and we've tested and have no problems getting the correct triples into triple stores.

rob-metalinkage commented 2 weeks ago

Did check out the the Jinja templating - it looks like a great solution for solutions that will use JSON-LD by preference, and needs complex objects. Use of a custom LD context reference that is Feature schema aware is a simpler solution that is not incompatible, but addresses the comment at https://github.com/geopython/pygeoapi/pull/868#discussion_r853538051 re "naive" JSON semantic enablement.

This would allow simple features with scalar properties, including references and "well known" embedded complex elements to be mapped to RDF without any custom templating required. It also means that interoperable solutions can be shared for examples such as the "SensorThings" example in the pyGeoAPI documentation - without relying on multiple implementers to arrive at identical solutions for each schema.

i.e. this is a proposal to push the hard work of working out the JSON structure and compatible JSON-LD context design to a featureType library with unit testing and sharing capabilities, rather than the pygeoapi installer. This gets complex when, for example, multiple sub-schemas have properties with the same name and different meaning - or implicit URI bases.

The limitations of pure JSON-LD are of course the geometry, @type and the @id issues - but the OGC Building Block for FG-JSON [1] supports featureType and id fields to be mapped to these using a context that can be combined with a simple featureType specific set of properties context to generate full RDF from "naive" JSON.

Its possible the geometry can be handled with a JSON literal in GeoSPARQL now. In any event, it would allow native GeoJSON objects to be better documented by at least making all the feature properties understandable. (Note schema.org is only going to handle a small subset of properties of real data that isn't a business on a map)

id mapping needs a base URI context, which would need to be combined with the featureType context - and JSON-LD doesnt support numerical values - so a reusable JINJA template that takes a configurable link to a remote custom context, and adds explicit id and type would be useful. I will try to experiment with this then propose a auto-configuration

To extend the idea - composing a jinja template from sub-schemas could be automated using a bblocks approach, improving reusability of patterns and simplifying configuration - i.e. no Jinja editing would be necessary.

[1] https://opengeospatial.github.io/bblocks/register/bblock/ogc.geo.json-fg.feature

PS @ksonda can you identify or share a test case, schema and pygeoapi config for the geoconnex example?

ksonda commented 2 weeks ago

jinja template: https://github.com/internetofwater/reference.geoconnex.us/blob/main/pygeoapi-skin-dashboard/templates/jsonld/ref-gages.jsonld

pygeoapi config: https://github.com/internetofwater/reference.geoconnex.us/blob/f86a6af729f7fae7debdd447f8f8704c3cfc9384/pygeoapi.config.yml#L362C1-L386C31

source data (distributed as gpkg, pygeoapi is running off the same table in postgis): https://www.hydroshare.org/resource/3295a17b4cc24d34bd6a5c5aaf753c50/data/contents/ref_gages.gpkg

ksonda commented 2 weeks ago

also regarding #868 comment , the URI field is specified in the pygeoapi config file with uri_field:, so the user can choose any property as the identifier. If uri_field: is not set, it defaults to an http URI built as server URL + collection/collectionid/items/ + item id. This behavior holds whether a jinja template is specified or if the default JSON-LD-with-Geosparql geometry + context specified in pygeoapi config yaml is chosen.

rob-metalinkage commented 2 weeks ago

Thanks @ksonda.

Its a good example of using an external model (HY_Features) to support improved interoperability.

This is also a great reference to explain the point of this issue: how can we make this easier to implement? The current JSON-LD is a very flexible and powerful option for RDF consumers. There is however an issue with interoperability with JSON schema consumers that can be partially addressed with a JSON-LD context for a standard OGC API (-X) schema (defining container patterns such as Feature, Record etc) mapping to a domain model.

The current templates and configuration are potentially doing multiple things, (everywhere, all at once) 1) translating data source schema to a target schema 2) setting the target schema 3) identifying objects in URI space 4) defining access URLs for objects 5) providing semantics for the target container 6) providing semantics for the target content

Note 3&4 may be conflated.

the target schema is complex, and has a lot of standardised Jinja patterns for optional properties. This shows a great deal of technical skill that I fear will not be readily replicable.

IMHO we can potentially define a target schema and a JSON-LD context by reference, and potentially build such a template with the sole job of mapping the source schema to the target schema.

An OGC BuildingBlock could define the schema and the JSON-LD context. In the short term, we could manually build the target schema as per current option, but pull in the standardised context by reference. This reduces one level of configuration overhead, but more importantly provides for a common implementation across multiple servers - without relying on complex templates being equal in output, (with no testing available)

I dont think it would be hard to auto-generate a JINJA template fragment from a target schema - the use could then do the mapping to the source schema, such a fragment could be generated and distributed as part of an OGC BuildingBlock - this is a good task for a code sprint perhaps?

as a further step, the mapping between source and target could be defined in configuration - as you do with the uri_field, and the whole JINJA template generated automatically.

I'll present this option to the Hydro Domain Working Group next week - along with a draft JSON schema derived directly from the HY_Features UML.

ksonda commented 2 weeks ago

Happy to be the subject of a HydroDWG code sprint. We've toyed with similar ideas before but in the end went with templating as something easier to implement quickly for an MVP. Regarding context by reference, I've been burned by, and am thus a bit paranoid about the resulting system being brittle to link rot or other persistence issues with things like remote contexts and schema definitions.

jmckenna commented 2 weeks ago

@rob-metalinkage this is the first I have heard of limitations of @type and @id in JSON-LD for spatial - are you referring to "@type": "Dataset" with spatialCoverage? ('challenges' to relay to users are often the odd coordinate pair order, YX). Or do you mean passing complex features through a GeoShape inside spatialCoverage ?

PS. as an OGC member, I wish we could have a JSON-LD working group to tackle this all together. (maybe @pbuttigieg could join too). Maybe you have more power than me to create this working group ;)

rob-metalinkage commented 2 weeks ago

@jmckenna i do think we need to coordinate better at OGC - I'm presenting on future proofing patterns at the ArchitectureDWG on Monday - just trying to raise awareness around the many overlapping issues.

The @type issue is that it must be explicit for JSON-LD to be able to declare object types - you cant look at the schema and see how property ranges are defined. This is just a mismatch that makes it harder to handle naive JSON. Then we see multiple layers of typing - e.g. FG-JSON must have "type" = "Feature" to match GeoJSON, so introduces a sub-type: "featureType"

I've been working with Prov model - and we need to know both super-type (Activity, Entity etc) and also specific domain type "Dataset", "Configuration", "logfile" whatever.

likewise @id must be a URI mapped string, but in other systems ids can be numeric for example. So heavyweight wrappers may be required to support JSON-LD, but we still want people to be able to understand JSON payloads, and cant expect JSON-LD support.

@ksonda hear you re remote references - but configurations could be explicit, but then implement engineering solutions such as serving a local copy.. We need to understand how best to determine this - but if we have a "featureType" then we probably never need to compare context references and a local cached copy would always be workable. We have links too - maybe an implementation profile that makes the canonical link available, but used local cached versions, or option to force embedded contexts..

ksonda commented 2 weeks ago

Another related application could be what we're trying to do for geoconnex with linking datasets to these HY_features. Basically combining HY_features work with science-on-schema.org and/or sosa/ssn

https://internetofwater.github.io/geoconnex-guidance/#sec-complete-examples

rob-metalinkage commented 2 weeks ago

A canonical schema for SOSA is here: https://github.com/opengeospatial/ogcapi-sosa

I have just created a ShapeChange configuration to generate a JSON schema from HY_Features model too. I am presenting this at the Montreal OGC meeting (next week) - but will check into a repo and share. Its a straw man that needs testing.

Also a discussion re WaterML/TimeseriesML has just kicked off.

Would be great to have a conversation about how the pieces could all fit together best - where should this take place? (not pyGeoAPI problem, though it informs the requirements)

rob-metalinkage commented 2 weeks ago

PS also note overlap with dataset descriptions - i.e. how are variables and procedures described (not defined in SOSA) - so looking also at GeoDCAT (in development) and STAC, as profiles of OGC API Records - another example of being driven by mapping existing schemas to ontologies:

https://ogcincubator.github.io/geodcat-ogcapi-records/

this is part of the mix, and a rationale for pyGeoAPI to pull in specifications and configure Features and Records (and possible other API flavours) with the common building blocks.

webb-ben commented 2 weeks ago

For example, https://reference.geoconnex.us/collections/gages/items/1000085?f=jsonld

Thanks for sharing. For my quick review of the JSON-LD: I am surprised to see that "property":"value" works in the JSON-LD, as it is proper instead to have "property": "value"

All json returned by pygeoapi is a serialized dictionary. note that https://geoconnex.us/ref/gages/1000085?f=json behaves the same way