MobilityData / gbfs

Documentation for the General Bikeshare Feed Specification, a standardized data feed for shared mobility system availability. Maintained by MobilityData
https://gbfs.org
Other
789 stars 290 forks source link

Using GBFS within a Linked Data/RDF publishing strategy #394

Closed pietercolpaert closed 7 months ago

pietercolpaert commented 2 years ago

Who am I

I’m a professor at Ghent University in Belgium, researching how to publish knowledge on Web-Scale. Previous work of my research team includes Linked Connections as a light-weight interface for public transit route planning, helping the European Railway Agency with publishing a dataset on railway infrastructure, helping the Flemish government publishing their base registries such as their address database and today we’re working on the Flemish Sensor Data Space, in which have a use case on bike sharing.

Motivating user stories

  1. As a data publisher, I want to use the GBFS terminology to annotate my website about my bike sharing initiative (e.g., with RDFa or together with schema.org)
  2. As the Flemish government, I want to align our vocabularies with GBFS and link towards the terms in the authoritative specification
  3. As a data consumer working on smart city infrastructure, I want to import GBFS data in my city’s NGSI-LD context broker

Solution

Convert the terms you define in the JSON schema towards an RDFS vocabulary. This can be done using a 1 on 1 mapping (I’m willing to pull request this if this is desired).

What should be the base URL on which all terms will be dereferenceable?

I’d propose https://w3id.org/gbfs#. This way, for example the term num_docks_available would get the URI https://w3id.org/gbfs#num_docks_available. We can open a pull request at https://github.com/perma-id/w3id.org to add a redirect from w3id.org/gbfs to for example a github pages on this repository with this RDF file behind it. This way machines will be able to look up the authoritative definitions.

Is your potential solution a breaking change?

pietercolpaert commented 2 years ago

Probably good idea to wait until this breaking change passed: https://github.com/NABSA/gbfs/pull/354

isabelle-dr commented 2 years ago

Hello @pietercolpaert, I'm a Product Manager at MobilityData, working on our tools and initiatives to increase data quality. 👋 Thanks for opening up this discussion.

I have very limited experience with linked data, RDF, and context information. I think this is a great opportunity, there is discussions in GTFS around versioning and URL schemes mentioning linked data.

I have a few questions to get a better understanding of what this proposal would imply:

As a data publisher, I want to use the GBFS terminology to annotate my website about my bike-sharing initiative (e.g., with RDFa or together with schema.org)

  • Why? In order to increase the discoverability? What are the motivations? Do you have an example of this in another area?
pietercolpaert commented 2 years ago

Hi @isabelle-dr thanks for getting back to me: much appreciated!

  • Why would you annotate a web page with GBFS semantics? In order to increase the discoverability? What are the motivations? Do you have an example of this in another area?

Discoverability and interoperability are certainly big motivations:

These two examples give an idea of the motivation behind Linked Data, which I like to summarize as drastically lowering the cost of integrating a dataset in a different domain.

  • If we were to build an RDF schema vocabulary, could it replace the JSON Schema, or would they be complementary?

I was (and still am) proposing a complementary approach where we try to generate an RDFS vocabulary and SHACL schema based on the JSON Schema files. However, we already know from experimenting with it together with @andreipopi that additional configuration is going to be necessary as there’s no full 1 on 1 mapping between these.

Just for being complete (this is not what I propose as it would requires changing your entire process as it is today and would broaden the scope of the GBFS schemas), the other way around would be possible in a more automated way: @ioggstream is working on RDF to JSON Schema: https://twitter.com/ioggstream/status/1473708713525534722

  • Did you consider JSON-LD?

JSON-LD is one of the serializations in which Linked Data can be serialized. What I propose above would be a requirement before being able to use JSON-LD.

  • Do you foresee any disadvantages or risks? e.g. higher complexity for consumers, or higher barrier to entry for producers

Disadvantage is that you’re going to do a little bit more. We’re going to document the extra configuration file that would be needed to document how the JSON schema can be translated towards RDFS and SHACL. Things I already think about:

Per JSON schema we’ll need:

  • What could be other advantages of using linked data?
  • How exactly could it help with the machine readability of GBFS?

Next the JSON schema tooling, also RDF tooling will be able to look up definitions and validate a file in any RDF serialization against the SHACL shape. I don’t see this as the biggest advantage.

  • What would be the impact on versioning and on discoverability for different versions (currently covered by gbfs_versions.json)

We can also include the major version number of GBFS in the web address of the term. Otherwise I don’t expect any impact.

stale[bot] commented 2 years ago

This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.

pietercolpaert commented 2 years ago

We are still working on a PR as a side-project. Not stale, give us a bit more time :)

stale[bot] commented 2 years ago

This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.

pietercolpaert commented 1 year ago

Still working on it. We are:

  1. Creating a spec for adding tags to JSON schemas that can then allow a processor to translate the JSON schema to RDFS and SHACL
  2. Prototyping the actual processor
  3. Creating a github action that we could pull request here to automatically generate the Linked Data specs inside this repository and start from there to have more discussions

Will share the link to the spec, processor codebase and github action applied on the GBFS json schemas after validating it internally.

mobilitydataio commented 8 months ago

This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs. Thank you for your contributions.

pietercolpaert commented 8 months ago

You can find the code of our experiments here: https://github.com/jiaoxlong/json-schema-ld/tree/main

We found the generated RDF Vocabulary and SHACL shape at this moment to not be good enough. The idea however remains interesting to pursue.

richfab commented 8 months ago

Hi @pietercolpaert, I am a Product Manager for shared mobility at MobilityData. Thank you very much for sharing your work on Linked Data for GBFS. The topic of interoperability is very interesting and important to us. As per the governance, this issue will be closed in 30 days if there is no additional re-engagement. Have a great day! Fabien

richfab commented 7 months ago

This discussion has been closed due to inactivity. Discussions can always be reopened after they have been closed.