Open 1ec5 opened 5 months ago
Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary
Another possible way to satisfy this use case would be to add osm2rdf:length
perimeter triples to areas.
osm2rdf has an option --add-relation-border-members
. It seems that the dumps available on https://osm2rdf.cs.uni-freiburg.de are currently built without that option. I think there was a time when we were concerned about very large numbers of triples. But since we now have over 40 billion triples already for OSM Planet, I don't think adding a few more is a problem.
@lehmann-4178656ch @patrickbr Do you agree?
@1ec5 I have set up a SPARQL endpoint for the data from Germany (that was quick to do), where relations now have the mentioned members. Can you please check whether that has all the triples you need: https://qlever.cs.uni-freiburg.de/osm-germany . Note that for that endpoint the geometries are obtained again with geo:hasGeometry (without the geo:asWKT). I didn't do that on purpose, it was accidental, but just so you know.
Here is a query which gives (and shows) all the geometries of all the members of Berlin: https://qlever.cs.uni-freiburg.de/osm-germany/TWlwsr
Thank you, yes, this query returns the least compact admin_level=8
boundaries in the extract according to the Polsby–Popper test.
@1ec5 Thanks for the feedback!
@lehmann-4178656ch @patrickbr The increase in the number of tripels due to --add-relation-border-members
is below 1%. So I would just add that option when building the datasets on https://osm2rdf.cs.uni-freiburg.de . Are there more such options that we could meaningfully add, which would make the datasets more complete?
@lehmann-4178656ch is already working on a PR to make --add-relation-border-members
the default. This also greatly simplifies the code. Another PR will add the object timestamps.
Regarding additional data completeness options: we are currently not outputting the "members" (node IDs) of ways. The reason is that most of these nodes are empty (without any attributes). We could do this, but it would significantly increase the dataset size (essentially, we would add 3 triples for each anchor point of a way geometry: (1) a triple connecting the way to the empty OSM node, (2) a hasGeometry
triple connecting the OSM node to a geometry object, and (3) an asWKT
triple connecting the geometry to its WKT representation).
Another thing I just thought of: we are also not outputting author information, which could be present in the input .pbf file (it is present in the input files we use for https://osm2rdf.cs.uni-freiburg.de/). I might be interesting to get all objects last authored by user X.
Also, the changeset id (basically an OSM "commit") is currently not dumped.
These way members and changeset metadata are often used in Overpass queries, but I refrained from asking for them upfront because I assumed they’d be of more interest internally to OSM and OHM than externally. Off the top of my head, one external use case would be finding a given building’s entrances, something geocoders might do to better serve navigation applications. Another would be finding street intersections.
For reference, Sophox includes relation members but omits way members. Sophox also includes the element’s version and last changeset, timestamp, and user. Some of the example queries make use of this functionality.
@1ec5 I am already convinced that these should be in our dataset. Just waiting for feedback from @lehmann-4178656ch and @patrickbr . They already agreed that we should have the information about changeset, timestamp, and user in our dataset. It's just a few billion more triples :-)
This query shows that no boundary or multipolygon relation in the OSM Planet dataset has
osmrel:member
s that are ways with the roleinner
orouter
. The only members in the dataset arelabel
andadmin_centre
nodes,subarea
relations, and plenty of tagging mistakes. This makes it difficult to perform tasks such as:Also, in this OSM discussion, I needed to access the ways that make up a boundary relation in order to determine the total set of ways that would be part of a proposed time zone relation. I had to drop down to Overpass, which has various recursing operators as well as a
length()
operator.