Thoughts on profile development and testing

wetneb commented 6 years ago

I find it quite frustrating that #58 is not merged, but I understand the underlying issue: as a maintainer, how can you be confident that changes to your profiles will not badly screw up the results in some cases?

So, do we need a sort of test-driven profile development? Has this been done anywhere else? Here a few random thoughts of how this could work.

A test case could be an origin, destination and one or more test point(s) through which the computed route would need to pass to validate/fail the test. To ease maintenance, test cases could come with a short description of why the route is expected to pass there rather than elsewhere. Each profile would have its own test suite, but ideally it should be possible for these test suites to share common tests.

Should the tests be run against a frozen state of OSM? That would have the advantage of making the testing process more reliable and would lower the maintenance costs for the test suite. Or should they be run continuously as the map evolves? That would have the advantage of matching the real state of the map better, and potentially notifying the maintainer of any important tagging changes.

If we had a nice test suite for a few profiles, we could even try to derive profile weights from test cases themselves! Convert all the test cases to a big bunch of linear inequations, and let the solver come up with something.

I'm sure plenty of people have done similar things before (I am not familiar with the field at all) so it would be important to research what is already out there before.

poutnikl commented 6 years ago

IMHO, the best test environment is area the profile developer/tester is familiar with, so it cannot be the common one.

It is on such a person familiar with area roads/ways to consider, what route the profile with given priorities should take and to check what it really chooses.

Also, concerning surface quality data, there is need to compare OSM data with reality, in case they do not match (well).

wetneb commented 6 years ago

Yes I agree that it helps a lot to know an area to design test cases. But I do not understand what you mean by "it cannot be the common one"? What "common one"?

poutnikl commented 6 years ago

It was an Android typo, I have immediately corrected it as common, but you replied to the original. ( posting from an Android GitHub client.)

Like personal test cases are IMHO better than common test cases.

wetneb commented 6 years ago

Okay, I still don't understand with "common" instead of "coming", so I have also edited my post!

poutnikl commented 6 years ago

In my understanding, you asked for some common test cases and test scenarios.

wetneb commented 6 years ago

I think a lot of basic tests can be understood without knowing the place… I think they should only be added in places where the decision can be made based on the OSM data (otherwise there is no point trying to figure out a routing profile that somehow manages to pass the test for an unrelated reason).

More specifically I am thinking of the following workflow:

I am using some profile to plan my trips;
One day I find out that one route was bad - I update the map to indicate why (steps, bad road surface, narrow corner, whatever);
I check that the routing profile avoids the problem with the updated map. If not, I add a test case for it, and try to find a sensible profile change to fix the issue.
I submit the profile update and the test as a PR (for instance) - the maintainer can review the change more confidently by checking that the change does not break existing tests, understand which issue prompted the change, and so on. If the maintainer does not agree with the change (we don't have the same taste, we live in very different countries) I just fork it and keep developing it with people who have the same riding style.

There are lots of small communities (your local cycling club, local bike advocacy group, city council) which could very well enjoy developing their own profiles based on their common taste or rules, I think.

poutnikl commented 6 years ago

There will be always cases the profile would choose a different route then its end users, even if it matches their preferences quite well. And, OSM data versus subjective reality perception adds additional layer of the profile Vs user difference.

Unless there is a profile code bug causing some crazy routing at some scenario, it is often better to choose different profile.

(or different profile parameters. As you may know, my profiles comes from a tunable common profile template. A script can easily generate many different profiles from the template by adjusting different profile parameters.

https://github.com/poutnikl/Brouter-profiles/wiki

wetneb commented 6 years ago

I still do not understand your point but it seems that you would not find these tests useful. We seem to be talking past each other so I will not try to convince you further…

poutnikl commented 6 years ago

It may be just mutual misunderstanding about the purpose. If it is test from developer or end user point of view.

If it behaves like it should, or if it behaves a user would like to.

poutnikl commented 6 years ago

P.S.: What I get is you may want to have some representative testing routes for modification of established built-in profiles, to see there was not introduced any regression.

Such a public routing test suite would obviously make sense.

OTOH, testing the introduced changes would rather have to rely on testing routes at a profile developer side, as it is not possible a priori predict what profile features and OSM data will be tested.

abrensch commented 6 years ago

A test case could be an origin, destination and one or more test point(s) through which the computed route would need to pass to validate/fail the test. To ease maintenance, test cases could come with a short description of why the route is expected to pass there rather than elsewhere. Each profile would have its own test suite, but ideally it should be possible for these test suites to share common tests.

Hi,

actually, there has ever been exactly one such test as part of the JUnit-Tests that are part of the build-pocess:

https://github.com/abrensch/brouter/blob/master/brouter-server/src/test/java/btools/server/RouterTest.java

It checks that the result for a short route with the "trekking"-Profile is binary identical to a stored gpx-file. The map used is a very small map of the city "Dreieich" that is crossing th 50-degree-latitude.

The intention of that test is more technical, testing the basic function of the router and the profile, and testing the tile-boundary-crossing.

However, that would be a way to add more "microscopic tests" for special aspects: oneway-logic, access-restrictions, oneway:bicycle=opposite, etc. But as you see: just objective questions (Does it still work or is it broken?).

But I think it's not a way to test the fine-tuned behaviuor of a profile. There you would need something more "fuzzy". Maybe a set of random medium-distance-routes (on a frozen map), and checking some cost-deltas between new and reference results?

regards, Arndt

Phyks commented 6 years ago

I guess @wetneb was referring mostly to simple examples such as the one in https://github.com/poutnikl/Trekking-Poutnik/issues/23#issue-375476236 (minimal distance, easy to understand / reproduce problem).

I guess this would mean storing an extract of the OSM data in the bounding box at this time, but this might be quite easy to do with Overpass.

Phyks commented 6 years ago

I was thinking about this issue and trying to fine tune the trekking profile to avoid some weird features (TLDR, it seems that adding a penalty for highway=traffic_signals node gives really nice results, I still have some work to do on it).

I wrote a small iPython notebook to help test and compare profiles with both full routes (city / country along cycle routes / country without cycle routes) and special features that should be avoided: https://github.com/Phyks/BrouterTesting.

This is, of course, biased towards routes I am familiar with (and geographical area is France). I am using a fixed copy of the segments4 (https://pub.phyks.me/brouter-testing/segments4/) and tried to provide notes / explanations / "human route" (what I would personally do) to help reuse it.

Phyks commented 5 years ago

I reworked my test collection into a simple web page, which should be much more convenient for everyone. The code is still in https://github.com/Phyks/BrouterTesting. The web page is https://github.com/Phyks/BrouterTesting/blob/master/index.html and the test collection is a (big JSON-like) JS array in https://github.com/Phyks/BrouterTesting/blob/master/tests.js.

The only requirement to be able to run these tests is to have a BRouter instance with these segments files https://pub.phyks.me/brouter-testing/segments4/ in use. Not sure if this is something which can be done / is interesting to have at the brouter.de level? (I could run my own on my server as well, but probably with worse SLA).

@abrensch When sending a bunch of queries to a BRouter instance (running locally, started with misc/server/standalone.sh script), I often get

operation killed by thread-priority-watchdog after 0 seconds

Is there an easy way on BRouter-side to limit this and make it eventually wait / auto-retry?

abrensch commented 5 years ago

operation killed by thread-priority-watchdog after 0 seconds
Is there an easy way on BRouter-side to limit this and make it eventually wait / auto-retry?

There's a command line parameter "maxthreads" when running the HTTP server. If that is exceeded, the longest-running thread is stopped. However, such a testsuite has no need to run multithreaded?

I started a server process using your frozen segements4 + profiles2 on brouter.de, port 7777, and I copied your webapp to http://brouter.de/BrouterTesting/

Not sure if it works correctly. When pasting in the trekking-profile I get maps for all testfiles, not just the failed ones?

Phyks commented 5 years ago

There's a command line parameter "maxthreads" when running the HTTP server. If that is exceeded, the longest-running thread is stopped. However, such a testsuite has no need to run multithreaded?

Indeed, my code was just a bit messy and hammering the server by sending all the routing computations at the same time. This is now reworked in my last commit to have a sequential flow and this should be fixed.

I started a server process using your frozen segements4 + profiles2 on brouter.de, port 7777, and I copied your webapp to http://brouter.de/BrouterTesting/

Thanks a lot ! I edited my webapp to use these settings by default.

Not sure if it works correctly. When pasting in the trekking-profile I get maps for all testfiles, not just the failed ones?

Indeed, so far it is more integration testing than unit testing. It is a collection of test cases (mainly bicycles and France based so far, but could easily be extended) and for each test case, I display:

The reference profile route (for instance trekking), in grey
The custom profile route, in blue
A "human" solution which is what a human would expect or do in this case, in green.

There is no automated testing, although this could probably be done quite easily, at least for the small test cases (testing a specific feature or behavior).

zod commented 3 years ago

I've looked through the open brouter issues and noticed that many issues are related to profile changes. Currently it's hard to determine if those changes cause regressions. Therefore I'd like to revive this idea of having a set of tests which can help evaluating changes.

I've forked the BRouterTesting repo from @Phyks and did some changes. The brouter-profiles-tester is now able to import tests as GeoJSON exports from brouter-web. There are some preliminary instructions how to use it.

To simulate the workflow I used issue #102 which suggested some minimal changes to the fastbike profile. I've created a test, exported it to GeoJSON and imported it into brouter-profiles-tester. Then I used the improved profile to check if the routing improved. Ideally there should be added more tests added to check if it causes any regressions.

brouter-profiles-tester currently uses the normal BRouter instance at http://brouter.de instead of the frozen segments instance at http://brouter.de:7777 because the frozen segments can't handle the current version of trekking.brf.

I've got some other ideas to automatically run those tests once we have a collection of tests but I first want to discuss if you think this would help the development of the profiles.

(I'm also tagging @afischerdev and @EssBee59 because they haven't been active in this issue before but did some profile development)

EssBee59 commented 2 years ago

I've looked through the open brouter issues and noticed that many issues are related to profile changes. Currently it's hard to determine if those changes cause regressions

Hello,

I agree that a lot of issues are related to profiles, but I think not only in case of changes/regression. As example in https://github.com/abrensch/brouter/issues/332 we have a standard situation where unexperienced users are not able to clarify a situation by themself. (a highway was defined as "unclassified" instead "residential", generating more turn instructions as necessary)

As Arndt, I think that automated regression tests are good ...for quick retests after a change in the routing engine...but...

I did not invest any time in automated tests by routing till now, but it seems for me very very difficult to test automatically a change in a profile.

To reduce the number of issues, or to help solving / analysing issues I would prefer to invest in an other direction:

==> create "some thing" that help the users (and developpers) to analyse a given situation (this avoiding unnecessary issues / resolving issues quicly).

To clarify / understand the origin of the problem (in OSM, the profile, the brouter, the navi-app, ...)
To find out a solution (or work arround)

This is the way I would follow, but the started discussion should continue

afischerdev commented 2 years ago

I also made a personal profile test system adapted from BRouterTesting. Node and json like this:

{
    "description": "Lemmer",
    "profile": "river_canoe_nomod",
    "params": "profile:shortest_way=0&profile:boat_height=1.0",
    "points": [ [5.709725,52.837426], [5.687383,52.871190] ],
    "results": {
        "track-length": 5013,
        "total-time": 6016
    }
},

But at the end I never use it. What I do for proflle development is mostly a visual control:

find a way (problem) in BRouter-Web
control OSM way/node definition if needed
make changes and run local
export gpx and json
view gpx on RouteConverter to have a view at the hints
view gpx on GPXSee to compare two gpx files
control json messages for used tags, CostPerKm and NodeCost

Visible voice hints in BRouter-Web could be helpfull - or do I miss a switch.

afischerdev commented 2 years ago

An other thought on profiles brouter/misc/profiles2 contains some main profiles that generates variantes (car-vario, fastbike, trekking) And there are some who don't need a variante (hiking, rail, river, moped) But there are also some profiles that look as if they could be 'normalized' to it's ground (car-eco-de, car-eco-suspect_scan, shortest, fastbike-verylowtraffic, vm-forum-liegerad-schnell, vm-forum-velomobil-schnell).

What do you think? It's possible to find a parent and generate then a variante from it? I think it could help not to loose other profiles when changes on e.g. trekking like #372

EssBee59 commented 2 years ago

But at the end I never use it. What I do for proflle development is mostly a visual control:

find a way (problem) in BRouter-Web

control OSM way/node definition if needed

make changes and run local

export gpx and json

view gpx on RouteConverter to have a view at the hints

view gpx on GPXSee to compare two gpx files

control json messages for used tags, CostPerKm and NodeCost

Visible voice hints in BRouter-Web could be helpfull - or do I miss a switch.

I full agree as I am doing quite the same (only exception, I do not know json)

For visual control different tools are available: Fist, the Brouter-web with a lot of features (profile editing, costs analysis, load track (to compare new/old version), load track as route....)
To test/verify the voice hints I am also using an other tool, the RouteConverter as yourself: it is very confortable for this task and visualisation/comparaison of track / routes. Some times I am also using my navigation app Osmand for test: Of course on the bike, but also to test a new profile and verify the turn hints (==> start navigation on a track ==> details ==> the list of turn instructions appears, a click on a turn shows the place on the map)

EssBee59 commented 2 years ago

What do you think? It's possible to find a parent and generate then a variante from it?

To generate a variant from a parent, you are probably thinking about changing some parameters? (or manual changes in the profile itself?) I think, the though is interesting, the question: is it possible, with a limited number of parameters to generate all these variants? For myself, I created 2 years ago my own fastbike profile because I had many / too many changes compared to the existing fastbike profiles. But within the "fastbike" family, yes I think it is possible to create variants from a parent.. and I implemented this on my local server and Android! (have a look at the current "fastbike-verylowtraffic", 3 variables control the behaviour: -turninstructionmode (turn hints for Osmand or other app) -consider_elevation (as the the user want) -consider_traffic (for me: 1 ==> fastbike-verylowtrafic, 0.3 ==> fastbike-lowtraffic, 0.1 ==> fastbike)

The same is I think also possible within other families (MTB, trekking..)

zod commented 2 years ago

What I do for proflle development is mostly a visual control:

brouter-profiles-tester allows to visually compare results from two different profiles. I think it should be used to check if changes to an existing profile cause a wrong behavior for other users. Therefore some routes should be collected as testcases to compare the results.

What do you think? It's possible to find a parent and generate then a variante from it?

But within the "fastbike" family, yes I think it is possible to create variants from a parent.. and I implemented this on my local server and Android!

fastbike-lowtraffic and fastbike-asiapacific are already generated from fastbike. While theoretically one could write one large profile which is only adapted using variables I think it's beneficial to keep some profiles smaller and separated. If they share some blocks maybe they could be generated using some fragments/templates.

EssBee59 commented 2 years ago

While theoretically one could write one large profile which is only adapted using variables I think it's beneficial to keep some profiles smaller and separated. If they share some blocks maybe they could be generated using some fragments/templates.

Good remark! ==>I suggest to use max 5 variables in a profile for creating variants ==> I suspect that big / complex profiles cost more battery in the Android-app, not very good for biking

afischerdev commented 2 years ago

@EssBee59

==> I suspect that big / complex profiles cost more battery in the Android-app, not very good for biking

Interesting remark, do you do continuously routing? I use it for planning 'only'. When route looks nice I start and app will follow the way. ;-)

EssBee59 commented 2 years ago

Interesting remark, do you do continuously routing? I use it for planning 'only'. When route looks nice I start and app will follow the way. ;-)

I started a fastbike several days tour last summer without planing: I caculated the route (170 km every day) on the bike, of course the route was recalculated at least 20 times on the travel)... I used alternativly fastbike-verylowtraffic (and need a powerbank after 120 km) or fastbike-lowtraffic (and need a powerbank after 150 km) But you are right, the differences are probably not due to calculation-power: I had today a look (at home with routeconverter...) on the turn instructions: (1) fastbike-verylowtraffic generated 107 turn instructions for the 170 km (2) fastbike-lowtraffic generated only 50 turn instructions for the 170 km With the option display ON/OFF in the navi-app and a time out of 30 seconds, the screen was in case (1) 53 minutes ON, in case (2) it was only 25 minutes ON.

(It is my decision to generate more turn instructions to avoid long distance without instructions and with a dark display!)

So, profile-complexity have probably no effect on the battery of the smartphone! But the complexity definitly impacts the calculation-time for long / very long routes!

poutnikl commented 2 years ago

So, profile-complexity have probably no effect on the battery of the smartphone! But the complexity definitly impacts the calculation-time for long / very long routes!

I very seldom calculate long routes without viapoints/shaping points. I do so in LocusMap route planner, following then the route, with eventual recalculation to the route. So initial routing and recalculation are quite fast.

I think both simple dedicated profiles with limited config and complex well tunable profiles make sense to have, so anybody can choose.

afischerdev commented 2 years ago

Another thought on profiles: Naming convention: We have map generation profiles and routing profiles. What about name prefix on map generation profiles? Like mapgen_all, mapgen_softaccess, ... To be clearer what are deployment profiles for app or brouter-web?

afischerdev commented 2 years ago

Another thought on profiles development: I made a test with json export and add a debug block for variables on way. Something like:

        "debug_messages": [
          {"lat": 9761129, "lon": 52332835,
"vars": "ispaved=0.0 isunpaved=0.0 any_hiking_route=1.0 any_cycleroute=1.0 is_ldhr=1.0 nodeaccessgranted=1.0 ismuddy=0.0 issidewalk=0.0 turncost=0.0 initialclassifier=0.0 initialcost=0.0 defaultaccess=1.0 footaccess=1.0 bikeaccess=1.0 footaccess=1.0 accesspenalty=0.0 SAC=0.0 SAC_scale_access=1.0 SAC_scale_penalty=0.1 costfactor=1.0 dummyUsage=1.0"},
          {"lat": 9760929, "lon": 52332726,
"vars": "ispaved=1.0 isunpaved=0.0 any_hiking_route=1.0 any_cycleroute=1.0 is_ldhr=1.0 nodeaccessgranted=1.0 ismuddy=0.0 issidewalk=1.0 turncost=0.0 initialclassifier=0.0 initialcost=0.0 defaultaccess=1.0 footaccess=1.0 bikeaccess=1.0 footaccess=1.0 accesspenalty=0.0 SAC=0.0 SAC_scale_access=1.0 SAC_scale_penalty=0.1 costfactor=1.0 dummyUsage=1.0"},
          {"lat": 9759595, "lon": 52333144,
"vars": "ispaved=1.0 isunpaved=0.0 any_hiking_route=0.0 any_cycleroute=1.0 is_ldhr=0.0 nodeaccessgranted=0.0 ismuddy=0.0 issidewalk=1.0 turncost=0.0 initialclassifier=0.0 initialcost=0.0 defaultaccess=1.0 footaccess=1.0 bikeaccess=1.0 footaccess=1.0 accesspenalty=0.0 SAC=0.0 SAC_scale_access=1.0 SAC_scale_penalty=0.1 costfactor=1.0 dummyUsage=1.0"},
          {"lat": 9759150, "lon": 52333859,
"vars": "ispaved=0.0 isunpaved=0.0 any_hiking_route=0.0 any_cycleroute=0.0 is_ldhr=0.0 nodeaccessgranted=0.0 ismuddy=0.0 issidewalk=0.0 turncost=0.0 initialclassifier=0.0 initialcost=0.0 defaultaccess=1.0 footaccess=1.0 bikeaccess=1.0 footaccess=1.0 accesspenalty=0.0 SAC=0.0 SAC_scale_access=1.0 SAC_scale_penalty=0.1 costfactor=1.0 dummyUsage=1.0"}
        ],

This was only a proof and contains way vars only at the moment. It needs additional vars for node section and a switch for debugMode. Any interest in this?

EssBee59 commented 2 years ago

I made a test with json export and add a debug block for variables on way. Something like:

I am not familiar with json, but I see the same information as in the data-window of the brouter-web? For json experts it could be helpfull, I will think about using json at the next profile tests

afischerdev commented 2 years ago

I am not familiar with json, but I see the same information as in the data-window of the brouter-web?

No, you will see the variable values of a track segment BRouter has calculated during routing. Variable names are the same as in profile used. (the sample was trekking.brf). So you can control if your script is running as expected. The normal message block contains the osm tags of a segment.

abrensch / brouter

Thoughts on profile development and testing #116