OpenSidewalks / OpenSidewalks-Schema

Repository for the draft proposal of the OpenSidewalks schema
https://sidewalks.washington.edu
43 stars 8 forks source link

Proposal: Network Primitives #3

Closed nbolten closed 2 years ago

nbolten commented 2 years ago

Hello! This is a proposal to include and elevate network primitives within the OpenSidewalks Schema. This proposal includes these core changes:

These changes have two primary motivations:

  1. The need to represent and construct a network directly from an OpenSidewalks Schema dataset.
  2. Placing an emphasis on the network structure more clearly communicates the purpose of the OpenSidewalks Schema and the paradigm it uses.

Included below is the full RFC description. Please provide any feedback on changes / this proposal ASAP, as we would like to integrate this change for use by outside orgs within the next week or two.

OpenSidewalks Schema version base

Motivation

The OpenSidewalks Schema standard is intended to capture network data about pedestrian spaces, but currently lacks primitives by which to describe network relationships - let alone requiring such data. Without node IDs on points and references to start/end nodes on pathways, the topological structure of OpenSidewalks Schema elements is left ambiguous; whether elements are connected is left up to downstream interpretation using spatial methods such as clustering line endpoints. This proposal recommends the adoption of such primitives as well as the relabeling of OpenSidewalks Schema types to emphasize that the schema is one of network elements. Once adopted, interpretation of OpenSidewalks data as a network will become unambiguous: a graph may be constructed by first (1) reading all node data and then (2) reading all edge data, checking that all edge references are to previously seen nodes.

Background

Relevance to the current schema

The changes required impact all top-level schema elements, currently described as either Points (elements that have a single longitude-latitude position) and Pathways (linear elements, i.e. LineStrings). Under this proposal, Points are considered to be equivalent to the nodes of a network and Pathways are considered to be equivalent to edges. This proposal does not change any existing properties on entities described by the OpenSidewalks Schema and is entirely additive.

Relevance to OpenStreetMap

OpenStreetMap data is inherently topological, with primitives called “nodes” and “ways” describing all points and pathways with an efficient structure whereby ways (describing lines and areas) refer to nodes, location data optionally enriched with key-value metadata, rather than describing their own geometrical information separately. These data can therefore be interpreted into a network structure unambiguously: OpenStreetMap nodes can be understood directly as graph nodes and OpenStreetMap ways can be thought of as paths over a set of non-explicit edges; a graph structure may be interpreted from ways based on the set of nodes they are listed as traversing.

OpenSidewalks Schema data may be derived unambiguously from OpenStreetMap as a directed multigraph wherein nodes are a 1-to-1 mapping to OpenStreetMap nodes, edges are a subset of way paths, and there may be more than one edge starting and ending at a given pair of nodes (hence the multi- part of multigraph). Similarly, OpenSidewalks Schema data not derived from OpenStreetMap may be transformed into an OpenStreetMap format unambiguously from a network format - though conflation with existing data prior to entering it into the global shared OSM database may still be necessary.

Data Model Changes

Proposed change 1: Rename elements to emphasis network abstractions

Data model

If the OpenSidewalks Schema represents a network, then its elements should be named like network elements. This proposal suggests changing the name of the “Point” entity to “Node” and the name of the “Pathway” entity to “Edge”. This will head off ambiguities in interpretation of our elements, particularly relatively to the rules we have already described, such as the fact that “Pathways” should share endpoints if they are to be considered connected - or, per this proposal, that they share node references.

Some of the previous "Point" entities are not network elements, such as a fire hydrant. These entities will still be defined as Point entities, so the schema would now have 3 entity types: Nodes, Edges, and Points.

This change to the data model is currently restricted entirely to documentation: per our schema, there is no metadata that explicitly states anything about an entity as being a “Point” or “Pathway”, nor will there be metadata that states an entity is a “Node” or “Edge”.

Network element interactions

N/A

OpenStreetMap tagging implications

N/A

Note: efforts should be taken to clarify the difference between OpenStreetMap Nodes and OpenSidewalks Nodes. They can be interconverted but have slightly different meanings, as Edges have their own geometries in our model whereas Ways only reference Nodes in OpenStreetMap.

Proposed change 2: Node identifier

Data model

In basic graph theory, nodes are defined simply as symbols that are referenced by edges. A Node (currently, “Point”) in our current schema has other data attached to it, of course (because our graph is embedded in space and has key-value metadata), but what is currently missing is such a symbol that uniquely identifies that entity over time.

Per this proposal, Node entities should have a new field added with a key of “_id” and value of a string that is required to be unique within the dataset. There are further constraints that could be valuable to add to the value of the “_id” field but should be considered part of future work, such as whether it should actually be a UUID and whether there are value patterns that distinguish the source of the node identifier (OpenStreetMap, an internal dataset, a made-up number).

Network element interactions

N/A

OpenStreetMap tagging implications

N/A.

Note: OpenSidewalks Schema Node identifiers may simply be stringified OSM Node IDs.

Proposed change 3: Edge identifier

Data model

In basic graph theory, edges are defined simply as a node tuple, usually references as a (u,v) pair. An Edge (currently, “Pathway”) in our current schema has other data attached to it, of course (because our graph is embedded in space and has key-value metadata), but what is currently missing is such references that uniquely identify an entity over time - and the graph structure beyond spatial inference.

Per this proposal, Edge entities should have a new field added with keys of “_u_id and “_v_id” and values of strings, themselves references to Node “id” values within the current dataset. Abstractly, these can be thought of as foreign keys, but in serialized form (how we usually describe our entities), they are strings, i.e. the same values as Node “_id” fields.

Network element interactions

N/A

OpenStreetMap tagging implications

N/A

Proposed change 4: Include bare node entities

Data model

If every Edge needs to reference two Nodes, then we will frequently encounter the case where a Node will have no metadata and serve only to describe the graph structure itself. However, there is currently no Node (“Point”) entity that is entirely free of metadata - only Nodes that represent features like curb ramps or fire hydrants are defined.

Per this proposal, there should be a bare Node entity defined that contains only geospatial information and the “_id” field. This will be entitled, “Bare Node”.

Network element interactions

N/A

OpenStreetMap tagging implications

N/A

nbolten commented 2 years ago

Comment period has ended! Will now merge.