Closed mmd-osm closed 5 years ago
Thank you for finding the issue. That's a bug.
Test query: node(1530957491)["name"!~"."][~"^name:.*$"~"."];out;
Bad behavior is caused by coding in void filter_ids_by_ntags
in query.cc in Line 608:
sort(removed_ids.begin(), removed_ids.end());
removed_ids.erase(unique(removed_ids.begin(), removed_ids.end()), removed_ids.end());
new_ids.erase(set_difference(new_ids.begin(), new_ids.end(),
removed_ids.begin(), removed_ids.end(), new_ids.begin()), new_ids.end());
Situation before new_ids.erase
:
new_ids: (6 entries): 1530957491 1530957491 1530957491 1530957491 1530957491 1530957491
Situation after new_ids.erase
:
new_ids: (5 entries): 1530957491 1530957491 1530957491 1530957491 1530957491
Although the object ids are all identical, the erase statement fails to delete all relevant ones - it just deletes a single entry. Effectively, way 1530957491 still survives in the result, although it has been identified to be removed by the negative key/value_regex.
The following pull request fixes this by first removing duplicate entries in new_ids
:
sort(new_ids.begin(), new_ids.end());
new_ids.erase(unique(new_ids.begin(), new_ids.end()), new_ids.end());
This fix is also deployed for testing on the dev instance: http://overpass-turbo.eu/s/b03
I'm reopening this issue, as the patch in https://github.com/drolbr/Overpass-API/commit/393581016b2b0ffa244bd84debfc02fe69c33386 causes performance regressions. The following very simple query used to take 50ms, and now takes 500ms.
Test case:
[out:json];node["name"~"[33][22][22][55]"]["phone"](3.2299951,-76.6815258,3.6299951000000004,-76.2815258);out body;
With an additional instrumentation in method generate_ids_by_coarse
, we can check the index used for _ids_bycoarse, as well as the vector size before / after removing duplicates. Results are shown below.
It's pretty evident, that we're sorting and removing the same elements (plus a few new ones) over and over again for the same coarse index.
I'm proposing to change this in a way that sorting / removing duplicates is only done once per it->first.val() & 0x7fffff00
. See referenced pull request.
ids_by_coarse - index: 461641984
Before de-dup: 38
After de-dup: 38
ids_by_coarse - index: 461641984
Before de-dup: 63
After de-dup: 63
ids_by_coarse - index: 461641984
Before de-dup: 100
After de-dup: 100
ids_by_coarse - index: 461641984
Before de-dup: 103
After de-dup: 103
ids_by_coarse - index: 461641984
Before de-dup: 155
After de-dup: 155
ids_by_coarse - index: 461641984
Before de-dup: 163
After de-dup: 163
ids_by_coarse - index: 461641984
Before de-dup: 202
After de-dup: 202
ids_by_coarse - index: 461641984
Before de-dup: 250
After de-dup: 250
ids_by_coarse - index: 461641984
Before de-dup: 281
After de-dup: 281
ids_by_coarse - index: 461641984
Before de-dup: 319
After de-dup: 319
ids_by_coarse - index: 461641984
Before de-dup: 450
After de-dup: 450
ids_by_coarse - index: 461641984
Before de-dup: 458
After de-dup: 458
ids_by_coarse - index: 461641984
Before de-dup: 531
After de-dup: 531
ids_by_coarse - index: 461641984
Before de-dup: 554
After de-dup: 554
ids_by_coarse - index: 461641984
Before de-dup: 604
After de-dup: 604
ids_by_coarse - index: 461641984
Before de-dup: 629
After de-dup: 629
ids_by_coarse - index: 461641984
Before de-dup: 674
After de-dup: 674
ids_by_coarse - index: 461641984
Before de-dup: 744
After de-dup: 744
ids_by_coarse - index: 461641984
Before de-dup: 781
After de-dup: 781
ids_by_coarse - index: 461641984
Before de-dup: 866
After de-dup: 866
ids_by_coarse - index: 461641984
Before de-dup: 890
After de-dup: 890
ids_by_coarse - index: 461641984
Before de-dup: 934
After de-dup: 934
ids_by_coarse - index: 461641984
Before de-dup: 1005
After de-dup: 1005
ids_by_coarse - index: 461641984
Before de-dup: 1027
After de-dup: 1027
ids_by_coarse - index: 461641984
Before de-dup: 1129
After de-dup: 1129
ids_by_coarse - index: 461641984
Before de-dup: 1266
After de-dup: 1266
ids_by_coarse - index: 461641984
Before de-dup: 1391
After de-dup: 1391
ids_by_coarse - index: 461641984
Before de-dup: 1494
After de-dup: 1494
ids_by_coarse - index: 461641984
Before de-dup: 1586
After de-dup: 1586
ids_by_coarse - index: 461641984
Before de-dup: 1698
After de-dup: 1698
ids_by_coarse - index: 461641984
Before de-dup: 1713
After de-dup: 1713
ids_by_coarse - index: 461641984
Before de-dup: 1759
After de-dup: 1759
ids_by_coarse - index: 461641984
Before de-dup: 1799
After de-dup: 1799
ids_by_coarse - index: 461641984
Before de-dup: 1835
After de-dup: 1835
ids_by_coarse - index: 461641984
Before de-dup: 1904
After de-dup: 1904
ids_by_coarse - index: 461641984
Before de-dup: 1909
After de-dup: 1909
ids_by_coarse - index: 461641984
Before de-dup: 1950
After de-dup: 1950
ids_by_coarse - index: 461641984
Before de-dup: 1968
After de-dup: 1968
ids_by_coarse - index: 461641984
Before de-dup: 1972
After de-dup: 1972
ids_by_coarse - index: 461641984
Before de-dup: 2055
After de-dup: 2055
ids_by_coarse - index: 461641984
Before de-dup: 2078
After de-dup: 2078
ids_by_coarse - index: 461641984
Before de-dup: 2087
After de-dup: 2087
ids_by_coarse - index: 461641984
Before de-dup: 2094
After de-dup: 2094
ids_by_coarse - index: 461641984
Before de-dup: 2098
After de-dup: 2098
ids_by_coarse - index: 461641984
Before de-dup: 2108
After de-dup: 2108
ids_by_coarse - index: 461641984
Before de-dup: 2109
After de-dup: 2109
ids_by_coarse - index: 461641984
Before de-dup: 2110
After de-dup: 2110
ids_by_coarse - index: 461641984
Before de-dup: 2116
After de-dup: 2116
ids_by_coarse - index: 461641984
Before de-dup: 2122
After de-dup: 2122
ids_by_coarse - index: 461641984
Before de-dup: 2272
After de-dup: 2272
ids_by_coarse - index: 461641984
Before de-dup: 2320
After de-dup: 2320
ids_by_coarse - index: 461641984
Before de-dup: 2358
After de-dup: 2358
ids_by_coarse - index: 461641984
Before de-dup: 2390
After de-dup: 2390
ids_by_coarse - index: 461641984
Before de-dup: 2448
After de-dup: 2448
ids_by_coarse - index: 461641984
Before de-dup: 2499
After de-dup: 2499
ids_by_coarse - index: 461641984
Before de-dup: 2542
After de-dup: 2542
ids_by_coarse - index: 461641984
Before de-dup: 2556
After de-dup: 2556
ids_by_coarse - index: 461641984
Before de-dup: 2565
After de-dup: 2565
ids_by_coarse - index: 461641984
Before de-dup: 2571
After de-dup: 2571
ids_by_coarse - index: 461641984
Before de-dup: 2608
After de-dup: 2608
ids_by_coarse - index: 461641984
Before de-dup: 2615
After de-dup: 2615
ids_by_coarse - index: 461641984
Before de-dup: 2623
After de-dup: 2623
ids_by_coarse - index: 461641984
Before de-dup: 2626
After de-dup: 2626
ids_by_coarse - index: 461641984
Before de-dup: 2640
After de-dup: 2640
ids_by_coarse - index: 461641984
Before de-dup: 2673
After de-dup: 2673
ids_by_coarse - index: 461641984
Before de-dup: 2690
After de-dup: 2690
ids_by_coarse - index: 461641984
Before de-dup: 2705
After de-dup: 2705
ids_by_coarse - index: 461641984
Before de-dup: 2712
After de-dup: 2712
ids_by_coarse - index: 461641984
Before de-dup: 2743
After de-dup: 2743
ids_by_coarse - index: 461641984
Before de-dup: 2800
After de-dup: 2800
ids_by_coarse - index: 461641984
Before de-dup: 2812
After de-dup: 2812
ids_by_coarse - index: 461641984
Before de-dup: 2817
After de-dup: 2817
ids_by_coarse - index: 461641984
Before de-dup: 2821
After de-dup: 2821
ids_by_coarse - index: 461641984
Before de-dup: 2888
After de-dup: 2888
ids_by_coarse - index: 461641984
Before de-dup: 3085
After de-dup: 3085
ids_by_coarse - index: 461641984
Before de-dup: 3093
After de-dup: 3093
ids_by_coarse - index: 461641984
Before de-dup: 3140
After de-dup: 3140
ids_by_coarse - index: 461642240
Before de-dup: 4
After de-dup: 4
It is now an enhancement because it is a merely question of performance.
Closing, as issue was fixed in c30c154df701d20cf1ec3e9274910d9dce31ce7d..e6394bff10c7ea8a9c68c876854b4b13feb85a3d
Thanks!
I'm reopening this issue, as there's a strange effect with the Karlsruhe node.
I'm testing with keyregexp and a Cyrillic character 'л':
node(49.01342968289611,8.403076529502867,49.015037577534585,8.406440019607544)[!name][~"."~"л"];
out;
/*
Test:
nodes: 21487097 -> ok ( [!name] removes node )
240120582 -> not ok ( [!name] has no impact )
*/
http://overpass-turbo.eu/s/pw1
My expectation is that node 240120582 (Karlsruhe) is not shown in the result, because is has a name
tag. This works pretty well for another nearby node 21487097 (Mühlburg), which also has a name
tag and the same Cyrillic character, i.e. the Regexp itself filters the correct nodes.
This one is even worse, returning lots of nodes with a name tag:
node({{bbox}})[!name][~"."~"a"];
node._[name];
out;
http://overpass-turbo.eu/s/pwz
Also tested on //dev.overpass-api.de/api_new_feat/ with the same effect.
I would like to obtain all nodes in a region that have no name
tag but do have name:lang
. The following query returns nodes that do have a name tag, even though it shouldn't:
node[!"name"][~"^name:(.*)$"~".*"];
This query returns invalid results depending on zoom level:
http://overpass-turbo.eu/s/Dww
[bbox:{{bbox}}];
node[~"^fhrs.*"~".*"][!"fhrs:id"];
out center;
Always the same reason: filter_ids_by_ntags is getting called with duplicates ids in std::vector< Id_Type >& new_ids
This issue was fixed in https://github.com/drolbr/Overpass-API/commit/2ca3f3877bf64a5b6ceaf91423721b37d61aec3b
According to the Wiki:
Let's take a look at the following examples:
out geom;
way(271221475);
way(271221475) ["addr:city"];
way(271221475) ["addr:city"] ["addr:hamlet"!~"."];
way(271221475) ["addr:city"] ["addr:hamlet"!~"."] [~"^addr:hamlet:.*$"~"."];
In example 3, the condition
["addr:hamlet"!~"."]
is not met. To my surprise, adding another key-regexp filter in example 4 overrides this. Is this a bug, undocumented behavior or did I miss something here?Reference data: