Closed 1ec5 closed 5 months ago
Interested in thoughts from @Rub21 and @batpad on this, especially paired with #702. To me, on-the-fly replacement of polygons with lines and the generation of centroids for labels seems like it might become overly complex, but admittedly I'm beyond the limits of my understanding of what sorts of Postgres queries we can include in our TOML file without hitting some sort of query limit.
The idea of converting polygons into lines sounds promising; it will significantly reduce the size of many tiles, and for long-term efficiency, that would be beneficial to me as well. However, I believe this adjustment needs to be made during the data loading process into PostGIS, rather than within the tiler. Therefore, it should be modified in imposm by changing a parameter to load lines instead of polygons for certain types of tags.
To achieve this, we can initiate the process by copying the tile-imposm directory into ohm-deploy/images. This approach allows for better customization without altering the existing functionality in osm-seed.
For clarification, is the central point intended to be added specifically for administrative boundaries? There will be two cases: if the administrative boundaries are polygons and relations. In the case of relations, they may already contain points within them. However, for polygons, we need to add a center point. Adding a central point could be challenging using imposm, but we could explore other alternatives such as a PostGIS function to create, delete, or update center points for polygons
For clarification, is the central point intended to be added specifically for administrative boundaries? There will be two cases: if the administrative boundaries are polygons and relations. In the case of relations, they may already contain points within them. However, for polygons, we need to add a center point. Adding a central point could be challenging using imposm, but we could explore other alternatives such as a PostGIS function to create, delete, or update center points for polygons
Yes, automatically generating a centroid point is important, for more than just boundaries: https://github.com/OpenHistoricalMap/issues/issues/543#issuecomment-1672591828. For boundaries, we’re already working around the issue by manually mapping and styling explicit points near the centroids. So I think we can pursue that optimization in parallel.
@Rub21 - any update on this?
@jeffreyameyer , Per voice call with Sanjay and Dan, we agree that generating the center-point for each polygon or relation that comes from OSM may be challenging. The process of adding, updating, and removing a center-point is not currently supported by IMPOSM. Apart of that, converting those boundaries into linestrings may affect names styling. we can chat bit more about it.
Per voice call with Sanjay and Dan, we agree that generating the center-point for each polygon or relation that comes from OSM may be challenging. The process of adding, updating, and removing a center-point is not currently supported by IMPOSM.
Are you sure the centroid point needs to be generated by imposm3? Can we instead generate it on the fly in one of the PostgreSQL queries in the tiler’s providers? For example, the landuse_points
provider would query the osm_landuse_areas
table, calling a PostGIS function such as ST_Centroid
instead of just regurgitating the geometry
verbatim.
Apart of that, converting those boundaries into linestrings may affect names styling.
Yes, generating the centroid points is a prerequisite for representing boundaries as lines.
To get the performance improvements, another important step would be to either join the boundary relations to deduplicate their LineString geometries or copy the relations’ details onto the member ways at import time. imposm3 might be more of an obstacle for doing this. I think we should take a look at how OpenMapTiles implements boundaries using imposm3.
/ref https://github.com/OpenHistoricalMap/issues/issues/543#issuecomment-1672591828
Are you sure the centroid point needs to be generated by imposm3?
Yes, you are right. Those can be generated in the Postgres queries. I was not seeing it that way; I was looking more into the imposm side. But adding centroid in the Postgres query totally makes sense. Let me add those centroid points
I have made evaluations to display the boundaries as lines and the centroids, but there are some cases where the centroid points are outside of the polygons/lines. For example, there are 96 cases only for Admin levels 1 and 2. 👇
For the following cases, we can convert the multipolygons to simple polygons and then create centroids from these simple polygons, which could be centered under the polygon/lines.
For the following cases, it will be a bit difficult because these are already simple polygons, and the centroids are falling outside. I am still investigating how these can overlap with the polygons and centroids.
there are some cases where the centroid points are outside of the polygons/lines
You’re right, this is one of the shortcomings of ST_Centroid
. There are some alternative centroid algorithms for better results at some (hopefully negligible) performance cost. openstreetmap-carto uses ST_PointOnSurface
. OpenMapTiles switches between ST_Centroid
and ST_PointOnSurface
depending on the geometry type. You may also need to account for degenerate cases by making the geometry valid first: openmaptiles/openmaptiles#487.
Sometimes the results of ST_PointOnSurface
can be a bit unintuitive because it only considers the maximum horizontal space:
Another option is ST_MaximumInscribedCircle
, which implements the same algorithm that MapLibre uses to label polygons natively. Here’s what a GeoJSON of the same multipolygon looks like in Overpass Ultra, which is powered by MapLibre GL JS, when zoomed out far enough that no tile boundary cuts across the feature:
Seems like center
of ST_MaximumInscribedCircle is the most appealing, which is a pretty cool function.
@1ec5 - what are your thoughts on supporting manually-specified labels?
Also - are these points just for boundary polygons or for all area polygons? (Ref #543 and #579 )
Asking for a friend...
@1ec5 @jeffreyameyer , ST_MaximumInscribedCircle works very good, here the results.
I am going to deploy it in staging
The centroid and lines for boundaries have been deployed in staging: https://vtiles.staging.openhistoricalmap.org/. The ohm_areas will continue to show until the styles are updated.
cc. @erictheise
The centroid and lines for boundaries have been deployed in staging
Awesome, thanks for the quick turnaround! Am I correct in understanding that we’re synthesizing the centroid points prior to tiling up the geometries? So we don’t need to worry about a bazillion United States points at z16, all in the same place?
Also - are these points just for boundary polygons or for all area polygons? (Ref https://github.com/OpenHistoricalMap/issues/issues/543 and https://github.com/OpenHistoricalMap/issues/issues/579 )
The idea is to eventually synthesize these centroid points for any polygon or multipolygon that we intend for stylesheets to label – boundaries, land use areas, buildings, you name it. We started talking about it in #543, but the discussion migrated over here because it’s blocking the conversion of boundaries from polygons to linestrings. If we’re only lighting synthesizing the points on boundaries for now, that’s fine, but then we should continue to track something similar for other layers in #543.
what are your thoughts on supporting manually-specified labels?
Yes, we already support manually specified labels, but we’ll need to implement something to avoid duplication. Some kinds of boundaries, such as cities, will always have a manually specified label because they should be labeled at a point other than the centroid. Ideally, we’d only synthesize a centroid point if the relation doesn’t already have a label
member. (Naturally, this condition is only relevant to boundary relations, not other areas or multipolygons.)
There will be many opportunities to polish this feature, but the most pressing need is to get something functional out the door so that a) mappers no longer feel pressure to manually map label
s at centroids, and b) we can start converting boundaries to linestrings. That step will probably require some more thought around how to merge features while preserving relevant dates. Once this initial iteration is deployed, we can get to work deleting the Newberry import’s centroid points, many of which lie completely outside of the boundaries they label.
Awesome, thanks for the quick turnaround! Am I correct in understanding that we’re synthesizing the centroid points prior to tiling up the geometries? So we don’t need to worry about a bazillion United States points at z16, all in the same place?
We are synthesizing the points and lines in the tiler server not imposm, here the configurations: https://github.com/OpenHistoricalMap/ohm-deploy/blob/staging/images/tiler-server/config/providers/admin_boundaries_centroids.toml , https://github.com/OpenHistoricalMap/ohm-deploy/blob/staging/images/tiler-server/config/providers/admin_boundaries_centroids.toml
Currently, the vector tiles in staging contain the boundaries for polygons, lines, and points. Once the styles are implemented for them, I am going to remove the boundaries for areas, so the vector tiles should be lighter.
Something I have been noticing is that the 'place_points' and 'ohm_land_centroids' layers are showing the same information in the points. We need to apply a filter to 'place_points' to avoid displaying the admin points. e.g 👇
Something I have been noticing is that the 'place_points' and 'ohm_land_centroids' layers are showing the same information in the points. We need to apply a filter to 'place_points' to avoid displaying the admin points.
Yes, this is what I meant above about deduplicating centroids: some of these manually mapped nodes can be deleted once we’ve deployed this feature. However, some place points need to remain because their locations carry more significance (like a city center). A style would typically make one of these place points look different than a centroid, if it labels the centroid at all, so the tiler would need to either:
label
member; orlabel
memberOnce the styles are implemented for them, I am going to remove the boundaries for areas, so the vector tiles should be lighter.
Will you attempt to deduplicate overlapping boundary lines at all? That would have a significant performance benefit, but it might be tricky to get right. Sometimes a boundary starts, ends, starts, ends, and so on, changing admin_level
s at various times.
Avoid synthesizing a centroid point if the relation has a label member; or Include a property that helps the stylesheet distinguish a synthesized centroid point from a label member
I am figuring out how we can include more attributes for the place_points layer. I am making some adjustments to the imposm.
Will you attempt to deduplicate overlapping boundary lines at all? That would have a significant performance benefit, but it might be tricky to get right. Sometimes a boundary starts, ends, starts, ends, and so on, changing admin_levels at various times. ,
That is the idea to convert all administrative boundaries into lines and centroids. I don't understand what you meant by 'start and ends,
Avoid synthesizing a centroid point if the relation has a label member; or
I have implemented this functionality, the centroids are going to be created if the polygons/relations do not have a label member.
I ran this query 👇 directly in the staging database. Later, if this works, I can implement it to run automatically in the database using a trigger or a cron job
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM information_schema.columns
WHERE table_name='osm_admin_areas'
AND column_name='has_label') THEN
ALTER TABLE osm_admin_areas ADD COLUMN has_label BOOLEAN DEFAULT FALSE;
END IF;
END $$;
CREATE INDEX IF NOT EXISTS osm_relation_members_osm_id ON osm_relation_members (osm_id);
UPDATE osm_admin_areas
SET has_label = TRUE
WHERE osm_id IN (
SELECT osm_id
FROM osm_relation_members
WHERE role = 'label'
);
The results e.g,
This relation do not have label in their members. , so it show a centroid in the vtiles. https://vtiles.staging.openhistoricalmap.org/#10.18/42.1345/-5.75
In case of this relation the has label member, the centroid it wont show up in vtiles.
Will you attempt to deduplicate overlapping boundary lines at all? That would have a significant performance benefit, but it might be tricky to get right. Sometimes a boundary starts, ends, starts, ends, and so on, changing admin_levels at various times. ,
That is the idea to convert all administrative boundaries into lines and centroids. I don't understand what you meant by 'start and ends,
For example, this way at the edge of San José, California, is a member of many boundary relations. Since each boundary is currently a polygon feature, the same coordinates are repeated many times over in the same tile. Just converting these boundary relations to lines probably wouldn’t affect the tile’s size, but I was wondering if whether you were planning to go a step further and dissolve the boundaries back onto this shared way so that it only appears once in the tile. If so, this might be an easy case, since all the boundary relations have the same boundary=*
and admin_level=*
tags. But sometimes it’s more complicated. This way at the edge of Texas has mostly run along admin_level=4
boundaries, but on two separate occasions, a total of four admin_level=2
boundaries have also run along it.
I think it would be fine to deploy a first iteration of this feature that doesn’t merge the boundary ways yet. At least that would give us an opportunity to clean up the data and the stylesheets. But sooner or later we’ll want to consolidate the ways in order to reduce tile size. This would also enable stylesheets to apply dashed lines to the boundaries; overlapping ways or polygons prevent dashing because one line’s dashes can fill in the other line’s gaps.
For example, this way at the edge of San José, California, is a member of many boundary relations. Since each boundary is currently a polygon feature, the same coordinates are repeated many times over in the same tile.
yes, you are right, the polygons are been created according the number of relations/closed ways, for the case of San Jose way edge The way is part of 164 polygons in the tiler DB.
If we want to keep only one geometry as a line in this case, we may need to add all the relation information that has been part of it. it means that the line should retain information from the 164 relations, including all details necessary to display on the map, such as name, start date, and end date. as well as the following lines too, that also a lot attributes informations for a tiles size.
Currently, the lines (geometries) are being repeated 164 times, but the polygon/linestring information such as name, start_date, and end_date appears only once in the tile.
I think it would be fine to deploy a first iteration of this feature that doesn’t merge the boundary ways yet. At least that would give us an opportunity to clean up the data and the stylesheets.
Totally make sense,
Before deploying those latest changes, I am going to make the script run automatically. What I've done so far is run many scripts manually in the database.
This task has been completed; the process of creating the centroids and avoiding them in case relations have label is working fine. https://vtiles.staging.openhistoricalmap.org/#4.26/-3.9/-88.94
If we want to keep only one geometry as a line in this case, we may need to add all the relation information that has been part of it. it means that the line should retain information from the 164 relations, including all details necessary to display on the map, such as name, start date, and end date. as well as the following lines too, that also a lot attributes informations for a tiles size.
Now that we’re generating separate point features for the labels, name properties on the lines wouldn’t be useful for labeling centroids anymore. They could be useful for boundary edge labels, but only if we indicate whether a name applies to the left or right side of the linestring. That would be unnecessary for now, since our styles don’t have boundary edge labels yet.
I was thinking that we could also detect that the way is part of relations that completely cover a certain time period. My example way from San José was continuously part of a boundary relation for many years, so all we need is a single linestring with the earliest relation’s start_date
and the latest relation’s end_date
. I don’t know how feasible this is with the current architecture. We can track this idea in a separate issue to keep things clear.
I've updated this ticket with a new name to capture it as vector tile work, and created a separate design-related ticket for the stylesheet updates (#787 )
This task has been completed; the process of creating the centroids and avoiding them in case relations have label is working fine. https://vtiles.staging.openhistoricalmap.org/#4.26/-3.9/-88.94
@Rub21 can you clarify here -- in order for @vknoppkewetzel and @tsinn to style all labels for polygons, it sounds like sometimes they need to target place_points
and sometimes they need to target land_ohm_centroids
. Is that correct? The two will never be part of a single layer?
I can style with them as separate but it seems like it would make sense to fold into the place_points
data in the future I think?
I do know some of the attributes are slightly different.
in place_points
, there is no reference to admin_level
like in land_ohm_centroids
. In place_points
however, type
has a value that is useful to know (possibly?) - the naming convention may be country-dependent, but the above example screenshots shows type=county
.
This means:
place_points
AND country name data layer styling for land_ohm_centroids
can you clarify here -- in order for @vknoppkewetzel and @tsinn to style all labels for polygons, it sounds like sometimes they need to target place_points and sometimes they need to target land_ohm_centroids. Is that correct? The two will never be part of a single layer?
The place_points
layer consists of all objects that have place=*
according to the OpenStreetMap wiki. This includes place=city
, place=town
, place=village
, and place=hamlet
. Examples of these points are:
When importing to the database using tiler-imposm, a conversion from place
to type
in the attributes is performed. These were established at the beginning of the project.
So, if there is a relation like this one, this relation object will be added to two layers:
land_ohm
(relation/polygons), which is the area of the relation.place_points
, because the relation has a point member with place=state
.For the land_ohm_lines
and land_ohm_centroids
layers:
land_ohm
(relations/polygons) that are areas will be converted to linestring
, and all attributes will be copied to the linestring
and it will be represented as the land_ohm_lines
layer.land_ohm
(polygons) that are areas will have their centroids calculated, and all attributes will be copied to the centroids, and it will be represented as the land_ohm_centroids
layer.Therefore, for the layers, the same object will have the same attributes in the land_ohm_lines
and land_ohm_centroids
layers.
As example:
place=state
(Node 2113249224). This point will be shown in the place_points
layer.Therefore, according to our new conversion of linestrings and centroids, we would have same object in place_points
, land_ohm_lines
and land_ohm_centroids
, as the comment here: https://github.com/OpenHistoricalMap/issues/issues/701#issuecomment-2098883713
To resolve the issue of duplication in the place_points
and land_ohm_centroids
layers, we have implemented some functions in the PostgreSQL database. If there is a point member in the relation with the attribute place=*
, it will be shown in the place_points
layer as it currently is. Otherwise, if there is no point member, it will be shown in the land_ohm_centroids
layer. This avoids having two points displaying the same attributes.
For example, the above relation and way do not have a member point with place=*
. For this reason, we are creating the centroids. Note, we need to create centroids for these objects because once we convert them into linestrings, there will be no way to show the representative names without using the centroids.
https://vtiles.staging.openhistoricalmap.org/#7.14/-10.298/-78.884
Thanks @Rub21 . Currently Country and state labels are housed in place_points
, and will in the future be housed in land_ohm_centroids
?
OR just "the relevant country and state created centroids will be brought into place labels"??? So land_ohm_centroids
is everything else?
This task has been completed; the process of creating the centroids and avoiding them in case relations have label is working fine. I had understood that the task was completed here, but perhaps that was just referencing the wrapping up of
land_ohm_lines
?
In #787 I've updated a test style that shows the updated land_ohm_lines
and includes a TEST data layer with the land_ohm_centroids
highlighted in red text. I can circle back to refine those further when the centroid work is finished. Does that sound like the right next steps for me on my end?
I am just trying to figure out how I am meant to style land_ohm_centroids
. :) I initially was confused to not see as many points as I expected, and then to see them referencing a variety of admin levels - but not consistently, geographically.
However, if the answer is just "duplicate all admin related labels and expect to pull from both place_points
and land_ohm_centroids
I can do that haha
If there is a point member in the relation with the attribute
place=*
, it will be shown in theplace_points
layer as it currently is.
Does this distinguish between the admin_centre
and label
roles? A member with either role would typically be a place=*
node, but we would still want to generate a centroid for the United States if it lacks a label
member, even if the node for Washington, D.C., is an admin_centre
member. These days, I view admin_centre
as somewhat off-topic for OHM, because mappers can indicate a “seat of government of” relationship in Wikidata; however, we have over 6,000 occurrences of this role, and I’d expect the number to increase over time due to influence from OSM.
Barring the discussion above related to admin_centre
and label
roles, given that this is now operating in production, is this ticket done? @danrademacher w00t!
This ticket is done, closing here!
Next actions:
A boundary that changes often results in many overlapping features in the vector tiles that are mostly redundant to each other. For example, a tile containing San José, California, weighs in the tens of megabytes, even though it contains little besides boundaries: https://github.com/OpenHistoricalMap/issues/issues/698#issuecomment-1954675311. San José might be an outlier in terms of the sheer number of annexations,[^records] but we can expect more problem spots in the future, particularly as mappers begin recording land swaps along river boundaries, which are complex to begin with.
The overlapping boundaries inherently come with exorbitant size overhead, even if we simplify the geometries aggressively: #702. For a sense of scale, the boundary polygons representing San José over time currently contain a total of 2,444,653 separately encoded coordinates, whereas a corresponding set of boundary polylines would contain only 69,674 separately encoded coordinates, since concurrent boundary edges would be encoded only once. Boundary edge polylines also require much less effort to render on the client side than full polygons.
Some of this redundancy is necessary, since a given edge might run in opposite directions after applying winding order to each boundary. In lay terms, along a given north–south boundary edge, San José might lie to the west of the boundary in one year but to the east in another year. A rigorous solution like #603 would probably have to address this problem comprehensively, but for now we could either defer boundary edge labels or find a way to keep two distinct linestring features running in opposite directions.
A polygon representation of boundaries has some utility, especially for rendering boundary halos of different colors, for example when a national park abuts an administrative boundary. It’s also essential for making a choropleth map out of administrative areas: #700. However, overlapping polygons make it impossible to reliably style boundaries as dashes: https://github.com/openmaptiles/openmaptiles/pull/1604#issuecomment-1868558482. For example, if we were to reintroduce dashed lines of different lengths based on
admin_level=*
, the dashes would clump up together forming a solid line whenever a municipal boundary runs concurrently with a county or state boundary.A naïve approach would be to replace existing usage of
ST_AsMVTGeom
with whatever we did in OpenHistoricalMap/ohm-deploy#89 to copy the details of anassociatedStreet
relation onto the member ways. We’ll also need to generate centroid point features for these boundaries at the same time: https://github.com/OpenHistoricalMap/issues/issues/543#issuecomment-1672591828. But we won’t be able to encode any side-dependent attributes, such as names, until we copy each polyline feature once per winding order.[^records]: 1,174 distinct iterations for San José, compared to only 891 for Columbus, Ohio, another major city that experienced sprawl.