Possible extensions in the Brouter with news tags

EssBee59 commented 1 year ago

Hello, I would like to discuss extensions in the brouter! Many enhancement-requests are already posted, here some examples: Noise https://github.com/abrensch/brouter/issues/476 Green area https://github.com/abrensch/brouter/issues/258 Round trip https://github.com/abrensch/brouter/issues/460

In the following I will consider (as example) these 3 enhancements: -consider the noise / pollution on the route -consider a river or see on the side of the route -consider the "green" aspect of the route (within a forest or park…) The basic idea is to introduce new calculated (or estimated) tags in the RD5 files. (in a similar way as the existing tag „estimated_traffic_class»)

Of course, the routing engine have also to be extended in order to support these tags, and the profiles have to use them according to the preferred route.

1- The first challenge is to calculate / estimate the tag values for the concerned highways: I started some tests and this seems possible(see documentation in pdf file):

documentation was updated

2- Extension of the RD5 files with the new tags 3- Extension of the routing engine to support the new tags (+ lookups.dat) 4- Extension of profiles Next steps: A first look at the results of the spatial-SQL´s (GIS) is very promising: the calculated values for the new tags seems good and usable. -a decision, which tags exactly should be implemented, can be made later -a further challenge is the calculation of the tag value for the planet, as it will take a lot of time! So I suggest next to test the impact of the tags « noise » and « river » on the routing within a regional osm-map (as example 30 km * 30 km).

The tags values are available, but implementing Extension(2) and (3) is prerequisite to start “real” tests. Is anybody ready to work on this extension / to create a prototype for testing(Extensions 2 and 3)?

EssBee59 commented 1 year ago

What are these blue circles for example at the end of the Galizienstraße..

This seems produced by overpass turbo when the zoom factor do not permit to display anything on the map. here the same area with more zoom:

EssBee59 commented 1 year ago

" is determined for the point of the road that is closest to the water, is this why the Untere Kirchgasse has water_value=0 while the Obere Kirchgasse has a non-zero water_value

exact: the ober Kirchgasse has not point with less as 100 m to the Mainufer!

EssBee59 commented 1 year ago

I will attach the overpass control files, so you can navigate and zoom in the area I selected (only a limited number of ways can be displayed at the same time ) the link to overpass: http://overpass-turbo.eu/#

newTagsOverpass.zip

nrenner commented 1 year ago

What are these blue circles for example at the end of the Galizienstraße..

This seems produced by overpass turbo when the zoom factor do not permit to display anything on the map.

In overpass-turbo under Settings (Einstellungen) there is an option to not show small features as POIs ("Kleine Features nicht wie POIs darstellen") that can be enabled.

EssBee59 commented 1 year ago

In overpass-turbo under Settings (Einstellungen) there is an option to not show small features as POIs ("Kleine Features nicht wie POIs darstellen") that can be enabled.

Thank for the tipp! The picture is much better when the option is enabled

EssBee59 commented 1 year ago

Hello afischerdev! Till now no feedback from tests were posted: do you think, we need further tests or enhancements in the current solution? Regards

afischerdev commented 1 year ago

@EssBee59 I don't know. Feedback from the server maintainers would be nice to have. If there is no space to publish bigger tiles or if pre-generation with database is a nogo then we should know. We could stop thinking about it now.

But if it's possible from a server perspective, then we should think about restricting it to certain areas to limit the effort. E.g. Northern Europe does not need the info 'road is green', most will be green and also offer no alternative.

@EssBee59

But by normal rivers (Gersprenz) we have only a line ? So I would prefer to keep the selection (union) as it is...

Yes, keep it, it's the best idea. The river Gersprenz is partly tagged with 8m width. Enough to build an area around, but we can't help that.

abrensch commented 1 year ago

Feedback from the server maintainers would be nice to have. If there is no space to publish bigger tiles or if pre-generation with database is a nogo then we should know. We could stop thinking about it now.

Is that me? Sorry I do not follow every post, so if you have hosting requirements please contact me directly.

For pre-processing I am just a guest an the dev-server of https://gk.historic.place/ so I'm not really "server maintainer". I try to behave like a guest and not eat up the server without a real pain. But I know these guys do keep a PostGIS in sync on that server for their stuff so there could be synergies.

For RD5 file size I do not worry about 10%. But I do worry about noise blowing up the tile diffs for the delta update. I remember I has ha hard time taming the noise introduced by the "estimated_traffic_class" tags. But that was an extremely "non-local" algorithm. I guess for the land-use evaluations effects of map-changes do not have "far distance effects" ?

regards, Arndt

EssBee59 commented 1 year ago

Hello afischerdev, Hello Arndt,

Thank for the feedback, I understand you have interest for the new tags! If any one liked to test on the test-instance I prepared a short documentation for that: test_new_Tags.pdf

But I agree, a roll out the function is not an easy job! A lot of choices have to me made, depending on the ressorces of the server, on the current processes, on the implementation of the delta updates ...

If you need some help from my own, you can also contact me directly.

regards

afischerdev commented 1 year ago

@abrensch

For pre-processing I am just a guest an the dev-server of https://gk.historic.place/ so I'm not really "server maintainer". I try to behave like a guest and not eat up the server without a real pain. But I know these guys do keep a PostGIS in sync on that server for their stuff so there could be synergies.

Sounds good. If they have already an OSM database and would share that for generation, we could rework the db scripts to add only the temporary tables for data generation and drop it after work. We have a variante for direct database access during rd5-tiles generation or for a file export from database and use it as input on generation time. And we could think about the update frequency, may be once a week or once month is enough?

@EssBee59 Thanks for the test samples. I played a bit with the Leine trail which uses river and elevation. Works well. But I was surprised to find some cities with rather low population. Alfeld Nordstemmen Elze I stepped through the results of json messages in the brouter-web view with the search function.

afischerdev commented 1 year ago

@EssBee59 The samples work great. And it's a good idea to show only these parameters in the profile.

the new tags are only available in west-europe (Germany, France, Spain, Swiss, Italy, Austria)

I would only add GB, NL, B, DK, CZ for the first step.

EssBee59 commented 1 year ago

I would only add GB, NL, B, DK, CZ for the first step.

Done, here the updated documentation for tests: test_new_Tags_V2.pdf

EssBee59 commented 1 year ago

But I was surprised to find some cities with rather low population.

Hello, I do not see what you mean: could you help me and explain what you would expect? ( the tag "town" is not used in that case, as we prefer to follow "waterway"..)

afischerdev commented 1 year ago

@EssBee59 I think I found the reason: We have two Hildesheim in database. Stadt HIldesheim Wahlkreis Hildesheim Same population but different area. All the smaller towns are inside the Wahlkreis.

EssBee59 commented 1 year ago

And we could think about the update frequency, may be once a week or once month is enough?

Due to the complex SQL´s (with joins) I have no plan to implement "delta updates" for the new tags.

But updates of these tags are not so urgent as updates for other tags that directly impact the routing... Thatfore I think there is no real need for an update frequency lower than monthes (between 1 and 12)?

Could this also apply to the "estimated_traffic_class" ? Then a full new-generation each N monthes + delta-updates as usual for the non "estimated" tags could be a simple soltution?

EssBee59 commented 1 year ago

But I do worry about noise blowing up the tile diffs for the delta update. I remember I has ha hard time taming the noise introduced by the "estimated_traffic_class" tags. But that was an extremely "non-local" algorithm. I guess for the land-use evaluations effects of map-changes do not have "far distance effects" ?

Hello Abrensch,

A calculation/estimation of the traffic is also possible using the same spatial database as above: could it be an option?

I started a test using a very simple model (considering the population of the towns/cities in a distance of 100 km + reducing the traffic depending on the motorway-density) Crazy sql´s but with an interesting result! here the input-data to visualize the tags of 3 regions with overpass-turbo

traffic-dreieich.log traffic-dieburg.log traffic-leipzig.log

EssBee59 commented 1 year ago

Hello, Here a new version for the tags calculation including "estimated_traffic_class" (for evaluation/tests first) Tests with all the new tags (noise, river, forest, town and traffic with new calc model) are possible on the test-instance http://brouter.de/essbee/#map=13/50.0926/8.6868/osm-mapnik-german_style regards newtags_1803.zip

quaelnix commented 1 year ago

@EssBee59, did you think about what I wrote in https://github.com/abrensch/brouter/issues/486#issuecomment-1399225261?

EssBee59 commented 1 year ago

In my opinion we should not drop the information that any of these ways are inside of a town, because a profile developer can easily exclude them by doing the following:

Hello quaelnix, In the example "Darmstadt" above I think it do not make sense to consider the administrative surface of the city (as big parts of this surface are forest!). Do you see a use case for tagging the forest part as "town"? If not I would prefer a direct usable tag (to avoid complex profiles + documenting the weakness at this place)

About using "population": Yes, population is a very good information in OSM: Currently as you can see in the last SQL version I am experimenting another modell to calculate the tag "estimated_traffic_class". (the calculation consider the cities / town population, the distance of my position to these towns, the motorway density within 15 km and the junctions motorway-highways within 2 and 1 km)
To evaluate / test the results I installed the new values on the test instance (same tag-name as ever, but the value is not calculated with Arndt modell).

Regards

quaelnix commented 1 year ago

Do you see a use case for tagging the forest part as "town"?

No, but please look at these examples:		A	B	C
Current	`estimated_forest_class=4; estimated_town_class=`	`estimated_forest_class=2; estimated_town_class=`	`estimated_forest_class=1; estimated_town_class=`
Proposed	`estimated_forest_class=4; estimated_town_class=`	`estimated_forest_class=?; estimated_town_class=3`	`estimated_forest_class=; estimated_town_class=3`

The fact that examples B and C do not have any town class assigned is highly problematic.

I would prefer a direct usable tag

Me too, but why do you think that my propsed tagging scheme is not directly usable?

To evaluate / test the results I installed the new values on the test instance (same tag-name as ever, but the value is not calculated with Arndt modell).

I've seen it and I think it might work pretty well, but I haven't had the time to do extensive testing yet. By the way, it would be much easier to evaluate the quality of the new tags if we had a decent visualization for them (without having to abuse the Overpass API).

EssBee59 commented 1 year ago

By the way, it would be much easier to evaluate the quality of the new tags if we had a decent visualization for them (without having to abuse the Overpass API).

I had also the need to visualize the tags...and spend a lot of time to get a DECENT solution with overpass. Sorry, currently not every one can visualize his prefered area. If you need more than the values you can get with brouter-web, I suggest you developpe a generic solution? I will attach the SQL and perl-script I created for that

Regard overpass.zip s

afischerdev commented 1 year ago

@EssBee59 The new scripts have been added to repro db-import. Adding traffic is a good idea.

EssBee59 commented 1 year ago

Adding traffic is a good idea.

Yes, but not every idea is good or leads to a success story... Could you test some tags?

afischerdev commented 1 year ago

Yes, but not every idea is good or leads to a success story...

true words

Could you test some tags?

Yes, I could do that. But it will take some time. Do you have something special to test? I would prefer inside the Hessen area. E.g. compare old and new traffic variantes?

EssBee59 commented 1 year ago

Yes, but not every idea is good or leads to a success story...

true words

Could you test some tags?

Yes, I could do that. But it will take some time. Do you have something special to test? I would prefer inside the Hessen area. E.g. compare old and new traffic variantes?

Yes, as traffic calculated with postgis could become an alternative to to old way ...if not too bad! The calculation is described in the "all.sql" file from zip above, if you have some further / better idea I could implement it for Hessen. Same for "visualization" (If you have an idea for offering a better solution) In the "overpass.zip" above, you will find 3 areas in Hessen with high, middle and low traffic (Dreieich.log, Dieburg.log and Michelstadt.log) The log is preformated (cut / paste) for overpass-turbo where you can have a direct look at the tag-value. (I let you gess the colors I selected for each tag-value!) Regards

quaelnix commented 1 year ago

Sorry, currently not every one can visualize his prefered area.

I just wanted to say that it might be worth thinking about investing time in implementing a more usable visualization technique than the one we have now. This was not meant as a criticism of your efforts, but rather as a suggestion to other fellow readers who might have more time than we do to implement something like this.

Same for "visualization" (If you have an idea for offering a better solution)

There used to be an ugly but practical visualization of the estimated_traffic_class tag. Maybe we can revive it and use it for the other tags as well.

See this thread: https://community.openstreetmap.org/t/brouter-estimated-traffic-class/71986/14

quaelnix commented 1 year ago

Yes, as traffic calculated with postgis could become an alternative to to old way ...if not too bad!

As far as I can tell so far it basically always beats the old method in the areas I know. I spend 1 hour trying to find an example where the new one fails and the old one doesn't and I didn't succeed.

nrenner commented 1 year ago

Same for "visualization" (If you have an idea for offering a better solution)

There used to be an ugly but practical visualization of the estimated_traffic_class tag. Maybe we can revive it and use it for the other tags as well.

That would probably be the easiest, but writing out a GeoJSON or OSM XML/PBF file with all tags instead of an image would be more flexible for further processing, e.g. vector tile generation. Or programming a rd5 data source for Planetiler to create vector tiles.

Ideally we should have a vector tile based debug view for the rd5 files like http://map.project-osrm.org/debug.

Not sure if that would work, but to spare additional vector tile / image processing and storage, my idea would be to read rd5 files directly in the browser:

convert the brouter-mapaccess clases PhysicalFile, OsmFile, NodesCache, ... to JavaScript using JSweet (don't know about current state of Java to WebAssembly?)
random access read with HTTP range requests (?) of 1 degree tiles from rd5 file (afaik a rd5 file has 25 data tiles of 1° size)
don't know how to pass 1° rd5 tiles to Maplibre GL though. Probably client-side conversion to Mapbox vector tile format (mvt), e.g. based on geojson-vt or vt-pbf

EssBee59 commented 1 year ago

Hello, Thank Norbert for proposing a solution to visualize pseudo tags! I think, it is a good idea to get the data from the RD5 itself, but I can not help or evaluate the other aspects (as I am DB expert but not geojson)

If any problems occur I have now a new idea to make the overpass solution (described above) accessible for any one: Assuming we get a postgis DB accessible from the web, then we can fetch the needed data by using s very simple JAVA programm that could be located on the brouter-server. HTTP request could be: pesudotag type + a town name HTTP response: a preformated overpass-command to visualize the 15 km arround the town (other response-formats possible)

quaelnix commented 1 year ago

@EssBee59, did you have time to think about the examples I posted in https://github.com/abrensch/brouter/issues/486#issuecomment-1475909253?

EssBee59 commented 1 year ago

You suggest not to clear the town tags when forest_class is 1 or 2? Would it be really "highly problematic" to clear the town tags in that situation? Do you think also, a river tag of 1 or 2 schould not clear the town tag?

Remember the goal of the "estimated" tags: They are intended to smously impact the routing on a long distance... a tag alone should never directly change the route in a comparable way as "acces=no" or "reversedirection=yes".

I agree, I suggested the town tag for my "own usage" (possibility to bypass a city / big town on a long tour): If we change the logic to calculate the traffic_class, the new values especially in towns could be better used as before.

quaelnix commented 1 year ago

Remember the goal of the "estimated" tags: They are intended to smously impact the routing on a long distance

Ok, try this route: http://brouter.de/essbee/#map=14/49.8744/8.6550/...

consider_town basically does nothing in this case, because the number of false positives where the estimated_town_class is suppressed by the forest detection logic is extremely high:

missing_town_tags

Would it be really "highly problematic" to clear the town tags in that situation?

Yes, the situation is bad enough that I would go through the pain of generating my own rd5 files in order to fix it.

You suggest not to clear the town tags when forest_class is 1 or 2?

No, but I would be fine with it if the result is close enough to a proper solution. My original suggestion was to not drop the town tag unless the way is inside of a forest polygon and to never drop it because of a nearby river.

quaelnix commented 1 year ago

And please don't get me wrong. I love the idea behind the new tags. But that's exactly why it would be even sadder if their true potential was given away for no reason.

quaelnix commented 1 year ago

Or, try this route: http://brouter.de/essbee/#map=14/49.8745/8.6420/..., enable consider_town and change line 366 in your race bike profile to switch estimated_town_class=3 100. The result will be a 2.4 km (+30%) detour through the city, instead of the near perfect route on the outskirts generated without consider_town!

abrensch commented 1 year ago

Just to let you know that FOSSGIS e.V. approved funding for a powerful development server for our project. It will probably be a Hetzner AX102, maybe with some storage extensions. So it seems we will soon be able to setup a preprocessing pipeline including the new pseudotags.

EssBee59 commented 1 year ago

Hello, A new version of the sql´s is now available and should be installed on the server when available. Regards pseudo-tags.zip

afischerdev commented 1 year ago

@EssBee59 I add this to the repo

abrensch commented 1 year ago

... FOSSGIS e.V. approved funding for a powerful development server for our project. It will probably be a Hetzner AX102, maybe with some storage extensions

The new server is available now. It's a plain vanilla AX102, so 1.8 TB of NVME SSD Storage

I switched the production pre-processing pipeline, so the last update was provided by the new server: https://brouter.de/brouter/segments4/

It was late because I started that manually, but the latency dropped from 5,5 hours to 2 hours.

Essbee is working with good progress on the postgres processing for the new tags, so I think we are only days away from having these tags in production.

Let me know if you want access to the server and/or participate in setting up the new pipeline

afischerdev commented 1 year ago

@abrensch Sounds great. Did you try the new #555 replacement? I already have a small change for OsmCutter and jdbc in stock and would like to bring it together with new pbfparser. Could do that later on today.

EssBee59 commented 1 year ago

Hello, Yes, the powerfull server provided by Fossgis is very well running, a first calculation of the pseudo-tags for the planet could be done within 40 hours!

New insights were discovered during this first run, minor errors that could be solved, but also the following point:

From the 40 hours processing, 18h 30 min was spent to just calculate the forest tags in "north-america". (forest calculation for Europe only need 5 h 20 min)

As remark: The "forest" is intended for bikers to follow routes with "green" charachter! that for many relation are considered: ==> SQL: where ((q.landuse in ('forest','allotments','flowerbed','orchard','vineyard','recreation_ground','village_green') ) or q.leisure in ( 'garden','park','nature_reserve'))

The very long processing time for north-america is due to the huge "nature_reserve" areas in this part of the planet. See "nature_reserve_SQL.txt for details

nature_reserve_SQL.txt

As example, when nature_reserve whis a surface > 10000 km2 are not considered, the processing time decreases from 18 hours to 2 h 30 min!

I think, it is not wise to spend hours of processing for huge "nature__reserve" (many of them in noth-america are see or ice area / Greenland).

I tend to change the SQL to consider only areas with less than 5000 km2.

Any other idea?

Regards

quaelnix commented 1 year ago

Any other idea?

Personally, I would just remove the "leisure=nature_reserve" query altogether.

afischerdev commented 1 year ago

My idea was to reduce the incoming data.

by id
by bounding box

A sample for me is the Greenland polygon. There are only a few ways in military camps or mining areas. I guess the problem is in the geometry calculation of the 'ST_*'- routines. So why not remove the problematic zones when they are known? It's not elegant, but it's effective. Less input less time.

A changed lua file you find in my repo

It has two tables in top as collectors and a check in process_relation. This could also be placed in process_way.

quaelnix commented 1 year ago

Let's take the Biosphärenreservat Rhön relation mentioned above:

nature_reserver

All roads (including all state and federal highways) in this relation carry the estimated_forest_class tag and none of them carry the estimated_town_class tag.

@EssBee59, I honestly do not understand why you keep ignoring the fact that this logic is fundamentally flawed.

EssBee59 commented 1 year ago

Hello

So why not remove the problematic zones when they are known? It's not elegant, but it's effective. Less input less time.

So, you would not remove all the "nature_reserve", I understand that (as example in Germany the "Rhön" is really interesting for bikers). It has a surface of only 2400 km2 (50 km * 50 km), si it makes sense to have such pseudo-tags to calculate a route longer as 100 km ? (only "widly" routing can take advantage of tags on huge areas)

So, it is very difficult to find a perfect solution.

Your suggest (based on lua + excludes) is not what I would prefer - lot of work, not easy to follow. I would prefer (when "nature_reserve" is interesting at all!!! see quaelnix comment) a solution per SQL (better readable andless work)

With "surface < 5000 km2" I hope to get rid of the performance problems, a change later remains very easy.

EssBee59 commented 1 year ago

that this logic is fundamentally flawed

Yes, depending on the routing you intend to do(widly or locally), considering "nature_reserve" can be confusing. What is with "very small" nature_reserve? (surface < 500 km2)

For me, it is easier NOT to consider any nature_reserve!

quaelnix commented 1 year ago

as example in Germany the "Rhön" is really interesting for bikers

Ok, so why not just add a simple nature_reserve=yes|no pseudo-tag that is decoupled from the city logic?

nature_reserve=yes: the way is inside of a leisure=nature_reserve relation
nature_reserve=no: the way is not inside of a leisure=nature_reserve relation

What is with "very small" nature_reserve?

How many of these are not already handled by the regular forest logic that evaluates the nature and landuse tags?

For me, it is easier NOT to consider any nature_reserve!

I agree.

afischerdev commented 1 year ago

Ok, it looks like "without nature_reserve" is the result. This will bring back the town flag to the @quaelnix sample.

nrenner commented 1 year ago

How many of these are not already handled by the regular forest logic that evaluates the nature and landuse tags?

In my opinion a forest inside a nature_reserve should get extra points, either by increasing the green_factor or having a separate tag.

abrensch commented 1 year ago

I added the new tags for now as the "secondary data set" at brouter.de/brouter-web, so you can see them by setting in your profiles:

assign processUnusedTags = true assign forceSecondaryData = true

After some tweeking, matching the Tags does not add to the latency (In fact, the modified pipeline is a little bit faster, but that's probably java-11 versus java-8). It add's 4% RD5-sizes.

So for me that seems o.k. now and I can move that to the regular data set tomorrow.

Data source for the new tags is essbee's staging table in the postgres database which itself is still subject to tweeking. So before developing new profiles against the new tags better wait for essbee announcing some stable state.

For matching the tags I modified afischerdev's code to a pre-load/memory-map pattern. The map is a "CompactLongMap" with de-duplicated key-value HashMaps as values, so the memory footprint is basically 8 bytes for the long-key plus 4 bytes for the reference to the (de-duplicated) key-value HashMap, times the number of rows in the staging table (120 millions), so a total of (8+4)*120m = 1,5 GB

EssBee59 commented 1 year ago

Hello Arndt,

I had a short look on the brouter.de server:

lookups.dat ok
the pseudo-tags are available
so I could start a short test with a "new profile" (as custom profile), same results as on my test instance!

About the tags: I found yesterday a bug in the SQL´s. I could fix it now and start soon a new generation.. (duration about 25 hours +-5) If the generation is successfull, I will update the data tomorrw after your RD5 generation.

regards

abrensch / brouter

Possible extensions in the Brouter with news tags #486