Open-Telecoms-Data / open-fibre-data-standard

Open Fibre Data Standard
https://open-fibre-data-standard.readthedocs.io
Other
14 stars 3 forks source link

Coordinates modelling (add support for WKT to Flatten Tool) #10

Closed duncandewhurst closed 1 year ago

duncandewhurst commented 2 years ago

We plan to reuse GeoJSON's Feature object to represent the physical location of a node (as a Point) and the route of a link between its endpoints (as a LineString) in both the JSON and GeoJSON formats.

Points

If we use Flatten Tool for conversion from the JSON format to the CSV format, Point geometries would be represented as a semi-colon separated list:

location/geometry/type location/geometry/coordinates
Point 26.081;-24.405

This poses two potential problems for users:

  1. The ordering of longitude and latitude is not explicit, so its easy to mix them up
  2. When importing the data into a GIS tool, additional processing might be required to split the coordinates. This is the case in QGIS, for example.

There are a couple of possible alternatives, either would require some special-casing in Flatten Tool:

Separate fields for longitude and latitude

location/geometry/type location/geometry/longitude location/geometry/latitude
Point 26.081 -24.405

This seems like the most user-friendly alternative. It is readily supported by QGIS, and presumably other GIS tools, and is equally usable for users who are not using GIS-specific tooling.

Well known text

location/geometry
POINT (26.081 -24.405)

This option is readily supported by QGIS, and presumably other GIS tools, but is less usable for users who are not using GIS-specific tooling.

Linestrings

I ran into some problems trying to flatten a GeoJSON Linestring in Flatten Tool so I don't know what the default behaviour is. One possibility is a semi-colon separated list of semi-colon separated lists:

location/geometry/type location/geometry/coordinates
Linestring [26.081;-24.405]; [26.09; -24.416]

Another possibility is a multi-table representation, related by id:

id location/geometry/type
1 Linestring
2 Linestring
id location/geometry/coordinates
1 [26.081;-24.405]
1 [26.09; -24.416]
2 [25.05;-23.234]
2 [25.16; -23.332]

Neither seems particularly desirable in terms of usability. Both would require substantial additional processing to import into GIS tools and the ordering of longitude and latitude is not explicit in either.

In terms of alternatives, separating longitude and latitude into separate fields would only work for the multi-table representation, which would still have significant usability issues. However, well-known text is an option:

location/geometry
LINESTRING (26.081 24.405, 26.09 -24.416)

Summary

Based on the analysis above, there are 3 options:

  1. If consistency in the representation of Point and Linestring geometries in the CSV format is desirable, then both could be represented using well-known text.

  2. If consistency is not important, then Point geometries could be represented using separate longitude and latitude fields and Linestring geometries could be represented using well-known text.

  3. The detailed routes of links could simply be omitted from the CSV representation, since it is adequately handled by the JSON and GeoJSON formats.

The purpose of this issue is to surface any other options that should be considered, to seek feedback on the preferred option and to explore the implications for tooling.

lgs85 commented 2 years ago

Thanks for laying this out @duncandewhurst. Am coming into this with minimal background so feel free to ignore if out of scope. One thing I noticed is that flatterer seems to deal with geojson, including points and linestrings, quite well already. Here's what an example output csv looks like:

type geometry_type geometry_coordinates
Feature Point "[102,0.5]"
Feature LineString "[[102,0],[103,1],[104,0][102,0]]"

More generally, my view is that for the vast majority of applications that would use the geospatial data, it will be easier to import a geojson file directly, so we shouldn't worry too much about presenting lat/long in analysis-ready format in a csv export. That being said, I don't love the idea of leaving the geospatial data out of the csv export entirely, as this could cause confusion e.g. if the csv conversion is used for database imports. So option 1, or something like the flatterer output if that's an option, might be the best bet.

duncandewhurst commented 2 years ago

Thanks, for the reminder about flatterer, @lgs85. I've opened a separate issue (https://github.com/Open-Telecoms-Data/open-fibre-data-standard/issues/14) on deciding what tool to use so that we can keep this issue focused on the desired modelling.

More generally, my view is that for the vast majority of applications that would use the geospatial data, it will be easier to import a geojson file directly, so we shouldn't worry too much about presenting lat/long in analysis-ready format in a csv export. That being said, I don't love the idea of leaving the geospatial data out of the csv export entirely, as this could cause confusion e.g. if the csv conversion is used for database imports.

I agree that GeoJSON would be easier for many use cases but that it would also be desirable to have the same information available in each publication format.

I don't think we should use Flatterer's representation of geometry_coordinates since it shares the same usability issues as Flatten Tool's representation and, arguably, it's worse since users would also need to handle the extra set of square brackets.

duncandewhurst commented 2 years ago

The W3C's Spatial Data on the Web Best Practice 8: State how coordinate values are encoded is a useful reference for this issue.

lgs85 commented 2 years ago

Coming back to this, I see no benefit in representing points as separate long/lat fields and lineStrings as WKT, as i) very few users will want to use just node data, so ii) they will have to parse the WKT for lineStrings anyway, iii) representing both as WKT has the advantage of consistency, which iv) makes conversion and conversion tooling easier.

Therefore suggest we represent both points and lineStrings as WKT.

duncandewhurst commented 1 year ago

For the Alpha, we'll use the default format provided by Flatten Tool and look to update the tool to provide WKT format in the Beta.

duncandewhurst commented 1 year ago

Feedback from the World Bank's infrastructure map team is that WKT is expected for CSV files so I think we do want to use WKT for both point and linestring geometries.

duncandewhurst commented 1 year ago

@bjwebb, I've created a draft PR with updated CSV examples showing what I expect the WKT format to look like. Please could you check that you're happy with it from a Flatten Tool perspective?

Edit: Noting that I've replaced the whole Node.location and Span.route objects with WKT fields, rather than only replacing Node.location.coordinates and Span.route.coordinates, since the WKT format encodes both the geometry type and the coordinates.

Bjwebb commented 1 year ago

This looks like what I'm expecting.