OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.
Creative Commons Zero v1.0 Universal
19 stars 1 forks source link

Add scale_rank=* (or something like that) to our rendering rules #544

Open jeffreyameyer opened 1 year ago

jeffreyameyer commented 1 year ago

Style change requested

Let the adventures begin... we currently don't have any mechanism for highlighting or bringing to the foreground (even for just a period) any sort of ranking of importance for features to stand out against others at zooms they might not normally be displayed.

For example, in many cities, small waterways were once more significant than they are today and perhaps have been even paved over.

Case in point: Shockoe Creek in Richmond, Virginia. It's the creek that runs pretty much straight north through the center of Richmond, up from the James River, and then bends west.

Every old map considered this creek significant enough to depict:

1864 Richmond Map image

1856 Richmond Map image

In addition, this creek shaped the naming of a nearby hill (maybe the hill named the creek?) and two different Richmond neighborhoods - Shockoe Slip & Shockoe Bottom.

It was even the border of an expansion of the city boundaries in 1867. image

Now, it's either dried up, had highways built over it: image

Unfortunately, two things are at hand: (1) OSM/OHM doesn't currently render natural=water + water=stream (there's no water=creek tag...) at any zoom... and (2) even if they did, it would not likely do so at browser zoom=13, which the old maps roughly did. image

Users could get around this to some degree by tagging the creeks with water=river, but tagging incorrectly seems to be a much worse practice than adding rendering-specific methods.

So... perhaps we could do the following:

(a) start rendering water=stream at browser zoom=18 or thereabouts for all times: image

and

(b) enable scale_rank tagging so that if this tiny creek were labeled with scale_rank=13, it would start showing up at browser zoom=13. Users could then time-bound that scale_rank such that it would revert to its natural rendering rules as its historical importance diminished.

Affected Tag or Tags

I think we have to add a new tag that will need to be accounted for in our vector tile filters, as well as our stylesheets (or maybe just the vector tile filters?

I'm open to suggestion of a system, but maybe something like: 'scale_rank=[zoom]', where there's a zoom level where you start always including the feature in the vector tiles, but that makes me think we'd also need a rule in the styles... who knows?

jeffreyameyer commented 1 year ago

Also - we might use this new scale_rank=* tag to highlight key POIs and other map features at zoom levels where they might not normally be displayed, similar to how Google & others highlight important items today.

1ec5 commented 1 year ago

OSM has always shied away from embedding cartographic hints in the database, because it requires mappers to predict what data consumers would want well into the future. Instead, the onus has been on data consumers to supply their own scale ranks.

For example, OpenMapTiles ranks features by applying heuristics that depend on the feature type. The Mapbox Streets source assigns a filter rank to each feature based in part on the number of Wikipedia language editions that have an article about it, based on Wikidata. Wikidata volunteers publish a QRank dataset that makes it easier to rank geographic features based on various factors.

I suspect that assigning a single scale rank per feature would be less tenable in OpenHistoricalMap than in OpenStreetMap. As you point out, a feature’s importance can change over time. If the map assigns a prominence to each city according to its historical or eventual importance, then the map at any given time will become less coherent.

Moreover, as we start getting into thematic maps, the relative importance of a feature in a given time period could differ between themes. For example, this hardware store would’ve had only local prominence even in its heydey on a general historical map, but it might be nationally prominent on a feminism-themed map. A railroad-themed map (#405) would consider any historical alignment of the Baltimore & Ohio to be pretty important even today, but not a road-themed map (#538) that also depicts railroad tracks by necessity.

OSM/OHM doesn't currently render natural=water + water=stream (there's no water=creek tag...) at any zoom

waterway=stream renders at (raster) zoom level 14 and above in both openstreetmap-carto and the Historic style. OpenMapTiles and Mapbox Streets also include streams at (vector) zoom level 13.

Users could then time-bound that scale_rank such that it would revert to its natural rendering rules as its historical importance diminished.

If the proposal is for scale_rank to establish the feature’s historical maximum, data consumers would need to be able to correct for a diminished importance in other time periods. But at that point, wouldn’t they have access to the data necessary to come up with a maximum scale rank on their own?

1ec5 commented 1 year ago

Is there a more data-driven way to address the Shockoe Creek example? For example, if it should be shown at a lower zoom level on account of its length, maybe the filter for waterways should account for length in general. If its prominence is due to being a namesake, then maybe something could be added in postprocessing to indicate how many features’ name:etymology:wikidata tags its wikidata tag matches. Maybe waterways that form boundaries should always be rendered at a lower zoom level than they would otherwise?

jeffreyameyer commented 1 year ago

Those are all good considerations, but I think this creek is a good example, because it fails most of those tests and yet is a prominent feature on most / all small-city level maps of Richmond in the 18th & early 19th centuries. I'd argue that some features are undeniably more important at times in history, across thematic use, particularly things like commerce or transport routes.

Example of failing:

Which historic style renders water=stream at z=14? I'm wondering if maybe I've done something wrong...

I do like the idea of a heuristic, as perhaps there are rules that could emphasize small waterways in the past & detune them at some point... same for railroads. But... that wouldn't help to emphasize particular cities or travel routes in the past.

Filter rank is cool, but isn't that effectively the same thing as a scale_rank?

For the idea of importance in a thematic map, couldn't people just use special thematic tags? e.g. my_custom_tag=important? (I love the Sparks hardware example)

One key point: I'm not sure scale_rank should be judged as anything other than a guide for OHM's own specific rendering tasks. It's not intended to solve anyone else's rendering hierarchy / priority, and I don't think we should market it as such. Would it be better if we just called it ohm_zoom=* or something like that to avoid any confusion?

1ec5 commented 1 year ago
  • No Wikipedia entry for Shockoe Creek in Richmond, but there is one to a different place, so the modern usage / reference isn't valid

The historic creek would be fair game for Wikidata, as long as there’s at least one citation to back it up (the same citation you’d add to OpenHistoricalMap, most likely).

Which historic style renders water=stream at z=14? I'm wondering if maybe I've done something wrong...

Yes, this way should be tagged waterway=stream. If you decide to also map the creek as an area, you’d tag it as natural=water water=stream. iD currently warns that, based on natural=water, you’ve only mapped half the creek and need to join the endpoints up to form a valid area, but in fact this is the wrong preset.

I do like the idea of a heuristic, as perhaps there are rules that could emphasize small waterways in the past & detune them at some point... same for railroads. But... that wouldn't help to emphasize particular cities or travel routes in the past.

You’re right, heuristics can be imperfect tools for curation, but I’d think mappers would have even less context or agency to make subjective rendering decisions.

For the idea of importance in a thematic map, couldn't people just use special thematic tags? e.g. my_custom_tag=important? (I love the Sparks hardware example)

We’re used to thinking about “the OHM renderer” making static decisions, but there’s actually very little keeping us from making the client-side rendering rules more dynamic. The time slider today lets the user decide which year to view, but the same underlying runtime styling mechanism could enable the user to define their own criteria for a custom historical map and share the result, blurring the line between the main openhistoricalmap.org map, Overpass turbo, and a third-party website that embeds OHM. A story map wouldn’t need to tag Sparks Hardware with anything; it would specifically style Sparks Hardware differently by its OHM node ID or wikidata tag.

In this scenario, the “official” rendering rules are just an initial default. I think there would be more benefit to developing general-purpose rules that ensure reasonably good cartography, regardless of the criteria, than to proactively build in a mechanism for exceptions that we’d have to revisit whenever the main style’s overall design gets a refresh.

Filter rank is cool, but isn't that effectively the same thing as a scale_rank?

Mapbox Streets used to have a single “scale rank” attribute that weighed the feature’s relative size and importance. Style designers needed more flexibility than this one-size-fits-all approach, so Streets v8 split it into separate “size rank” and “filter rank” attributes.

These knobs are intended for adjusting feature density. For example, a public transportation map should show some street labels for user orientation, but the street label layer shouldn’t overwhelm the bus routes and bus stops, so they need to be sparser than on a road map. Some areas have lots of streams while others have few streams that are thus more notable; the designer can filter out streams with a lower scale rank to normalize the stream layer somewhat.

It helps if the rule for eliminating clutter is reasonably intuitive,[^opencyclemap] namely size and importance, with various things used as proxies for importance. Another consideration is stability when zooming in further. It’s easier to make decisions about these ranks if you can make assumptions about stylistic attributes like font size and anchor placement.

Would it be better if we just called it ohm_zoom=* or something like that to avoid any confusion?

That would be an improvement, since it seems to have a different purpose and format. Scale rank is really just a number on a scale, relative to other features of the same class, whereas this proposed key would have absolute zoom levels as values. The zoom level isn’t about managing clutter; in fact, it could contribute to feature density intentionally in an area with lots of really important things.

Specific rendering hints would represent a step back from the world of rule-based or data-driven cartography that OSM and OHM come from. The best precedent I can think of in OSM would be maneuver relations, which are hints for routers in exceptional circumstances, but those hints are pretty objective. They basically exist because of quirks in OSM conventions for mapping roads.

[^opencyclemap]: As an example of what not to do, if I recall correctly, OpenCycleMap used to have a bug that prioritized place labels alphabetically, causing the map to be dominated by small towns starting with the letter A, which collided out much larger, alphabetically-challenged cities.

jeffreyameyer commented 1 year ago

Ok - water=* mistagging fixed. Doh! Separately... I'm not sure I've ever seen a footnote in a github issue comment - impressive!

Back to the discussion at hand:

I'm not a huge fan of hand-curating data in the stylesheets... if someone wants to call out Sparks for its feminist status, maybe that's a designation someone else might want to use in their storytelling, but if that designation is hidden in a stylesheet, it would be difficult to reuse.

In fact... I would kind of love it if people were to adopt their own tagging conventions & hang their personal tags on the data in a way that might lead to some discovery/discoverability of how that data was being used elsewhere. e.g. my_special_tag=value for whatever presentation/styling/dataset they needed & if there were a way of finding a site that was using my_special_tag, that would be awesome. Hard to do that with a stylesheet that leaves no trace in the data.

I also think explicitly tagging the data with the rendering attributes makes things a little more intuitive, by transparently exposing the setting for that data. I agree that scale rank and filter rank are a little opaque, but at least there's a pretty obvious way to search for the meaning.

I'm less hung up about the rule- or data-driven principle thing, but maybe the objective rule is: "it's featured on almost all old maps for a particular period / era"?

1ec5 commented 1 year ago

I'm not a huge fan of hand-curating data in the stylesheets... if someone wants to call out Sparks for its feminist status, maybe that's a designation someone else might want to use in their storytelling, but if that designation is hidden in a stylesheet, it would be difficult to reuse.

To be clear, the idea was that nothing gets hard-coded in a stylesheet per se. With tools like GL JS, you can already build a one-off story map for a website that highlights particular features. It’s true that the curation criteria would live in the code for that website, right next to the commentary. That seems appropriate to me for a feature that you want to actively call out in a story, but now I see that’s tangential from what you were focused on, the need to align to individual choices that every cartographer would make manually.

A couple years ago, one of the openstreetmap-carto maintainers really wanted to show San Francisco more prominently than San José, like literally every other map does. Unfortunately, San José comes out on top by the standard metric of population. Users expect maps to prioritize San Francisco based on a different metric that’s too subjective to tag in OSM: fame. There were various ideas for things we could tag that would tip the scales in favor of San Francisco, but they all would’ve produced weird results elsewhere. One idea would’ve ended up making Oakland more prominent than San Francisco.

In fact, a good proxy for a city’s fame would be its Wikipedia article page views, which is part of what QRank does. Another would be the number of mentions in literature, handwaving about cities sharing the same name. But then there’s no need to hard-code these results in the database, because a CSV somewhere in the renderer codebase would be more usable, less prone to vandalism, and potentially dynamic.

As you’ve already pointed out, Wikipedia isn’t granular or comprehensive enough for the locally prominent geographic features we need to rank. But I think you’ve demonstrated that few of these decisions by cartographers are entirely arbitrary. We both agree on the need to facilitate rendering that adheres to conventions, but I’d contend that renderers still have low-hanging fruit before we need to burden mappers with manual annotation. And for the things that defy heuristics, we should ask mappers to show their work beyond an enigmatic number that different mappers will naturally disagree on.

I can think of one other precedent in OSM for subjective rendering hints: highway classification is essentially telling the renderer the minimum zoom level of a road segment. Probably more than half of all communications between mappers in the project’s entire history relate to highway classification. One fine day, a clever developer will get around to integrating their renderer with the part of a router that builds a routing graph, and all that effort and angst will turn out to have been for naught.

vknoppkewetzel commented 1 year ago

I think these are all good (and important!) points about individual maps and highlighting important data pieces/geographies, alongside the difficulties of a world wide, multi-zoom level map that has complex and still imperfect data availability/ingestion (like like of small creeks or ephemeral water bodies). It is very true historically that water and other natural things (like forests!) were super important as determining factors in where things existed.

I think for visualizing the priorities of data pieces, what @1ec5 said here is a really good example:

A couple years ago, one of the openstreetmap-carto maintainers really wanted to show San Francisco more prominently than San José, like literally every other map does. Unfortunately, San José comes out on top by the standard metric of population. Users expect maps to prioritize San Francisco based on a different metric that’s too subjective to tag in OSM: fame. There were various ideas for things we could tag that would tip the scales in favor of San Francisco, but they all would’ve produced weird results elsewhere. One idea would’ve ended up making Oakland more prominent than San Francisco.

With print maps, we can and do make as many adjustments as need in order to create a custom (and culturally correct, per whatever knowledge we are given) basemap that ensures all data is in the correct visual hierarchy. But with interactive maps, we always have to let go that level of perfection since it ultimately is too hard to do that in an automated fashion in every city, globally. When one starts to make adjustments/exceptions, it then makes it hard to say "no" to making adjustments elsewhere.

Providing a standard and consistent scale rank for populated areas like cities feels easier to do (can say "we went based on population) but for any other feature that starts to become really difficult because cultural significance can vary a lot and, globally, it's hard to know the true priority or importance of a particular creek, for example.

I do think showing creeks if the data exists would be nice, though. It sounds like creeks do not often exist in the data at all? Or am I misundersatnding that?

1ec5 commented 1 year ago

I do think showing creeks if the data exists would be nice, though. It sounds like creeks do not often exist in the data at all? Or am I misundersatnding that?

The creek example turned out to be a red herring. It shows up now at z14 and above after a tagging adjustment in the database.

jeffreyameyer commented 1 year ago

Agreed - at this point, this is probably a bad example (although I do think streams should get at least 2 pixel width at higher zooms, but that's a separate topic : ).

But, I do think the issue remains of how to call out items (e.g., calling out historically important community buildings or POIs at a higher zooms than other, less significant buildings). Understood that everyone may have different ideas of what is historically significant, but I feel comfortable in allowing users to make some editorial calls in that area. At least until it becomes a problem. I think the benefits of this (cool factor) will outweigh the headaches of deconfliction (a problem I'd like to have!).

1ec5 commented 6 months ago

Without any new tagging scheme, it’s already possible for a stylesheet to automatically rank places by their age or longevity – which depend on client-side choices that we can’t possibly hard-code in the database. #801 would enable a stylesheet to also rank by area. These factors don’t necessarily add up to a holistic sense of importance, but they may reduce the urgency for something hand-curated.