Closed mmd-osm closed 7 years ago
For performance reasons, I would consider an additional database mapping.
Nowadays, the tags are essentially saved as a very long sequence of the form (key value id+)* in alphabetical order on keys, then values.
Because an arbitrary regular expression doesn't allow to derive an alphabetical range (or few ranges), the full file of several GB size must be read for each sequence. That's too slow to be useful.
To overcome this restriction, a simple plain list of keys could be maintained to check what keys would fit the regular expression. For reasonable regular expressions this works quite fast: there are about 50.000 different keys, on average about 20 bytes long. To read a megabyte of data doesn't hurt.
However, such a database extension makes sense to be done along with other database extensions, so this will become available together with the history feature (which is a huge database extension in itself).
Hi! I'd also be interested in using regex or any kind of wildcard to look for certain groups of keys. It would be helpful for quality assurance, IMHO. addr:* comes to mind, or note:, wikipedia:,... Also, regex could be used to find wrongly spelled keys easily.
I've been shown that JOSM can filter keys with regex: "addr:."="." gives everything that has a key that starts with "addr:" But of course JOSM only works with the data that has been downloaded, so it's never too much.
The list of keys that @drolbr suggests sounds like a good thing for most use cases. But what if you are looking for unknown / undocumented keys or typos -- would a list be any help then?
And in case the list will take too long to implement, how about a strict time-out for regex-queries like that? Or maybe a limit for the area that is being queried?
I would like to "vote" for this issue, as I also need the multiple keys search.
I would like to "vote" for this issue, as I also need the multiple keys search.
A first version of the feature is already online since some days: http://overpass-turbo.eu/s/4Ox Please use the tilde in front of the key term to make it search for a regular expression:
[~KEY_REGULAR_EXPRESSION~VALUE_REGULAR_EXPRESSION]
Some restrictions so far:
These are the reasons why the feature has not been announced publicly yet.
@drolbr And also, regular expression JUST for key is not allowed (It have to be paired with regular expression for value)
(Alone regular expression on JUST the key is what I'm looking)
@bluesm: Is it just any value (value doesn't matter) or rather some specific value you want to specify in the query? Can you provide some examples?
Values doesn't matter, And i serach for multiple tags in the polygon.
(Now I'm getting all nodes and ways, and parse it on client side, which kind of suck, especially with a lot of data$ — Sent from Mailbox
On Tue, Sep 2, 2014 at 6:13 PM, mmd notifications@github.com wrote:
@bluesm: Is it just any value (value doesn't matter) or rather some specific value you want to specify in the query? Can you provide some examples?
Reply to this email directly or view it on GitHub: https://github.com/drolbr/Overpass-API/issues/59#issuecomment-54176614
@mmd-osm
Hi
This feature comes also in handy for my use case which is to find all objects which have no opening_hours tag but could probably be added easily based on infos on there website.
I tried it both ways:
Without regular expressions for the key:
[out:json][bbox:{{bbox}}];
( /* tag website is not so often used as amenity */
node["website"];
node["contact:website"];
node["opening_hours:url"];
way["website"];
way["contact:website"];
way["opening_hours:url"];
)->.req_info;
(
node.req_info["amenity"];
node.req_info["shop"];
way.req_info["amenity"];
way.req_info["shop"];
)->.facilities;
(
node.facilities["opening_hours"!~"."];
way.facilities["opening_hours"!~"."];
)->.not_oh_value;
.not_oh_value out body center;
See overpass-turbo, request took 3.844s (3.118s the second time).
And with regular expressions for the key:
[out:json][bbox:{{bbox}}];
(
node[~"^(website|contact:website|opening_hours:url)$"~"."][~"^(amenity|shop)$"~"."]["opening_hours"!~"."];
way[~"^(website|contact:website|opening_hours:url)$"~"."][~"^(amenity|shop)$"~"."]["opening_hours"!~"."];
);
out body center;
See overpass-turbo, request took 2.220s (0.978s the second time).
So it works great so far. Any recommendation what I should use?
Thank you! It works really well, IMHO. Is it already supported in the wizard?
@bluesm If values don't matter, why not use a "." as value?
Is it already supported in the wizard?
It is now (see commit above). It's not live yet, but you can test it out here. The syntax is similar to the QL version, e.g. ~"^name" ~ "…"
.
Nice! Thank you!
Wow! very nice feature!
After some tweaking the query in Wizard mode, I was able to find all OSM objects with addr:= and place=* :
enter in wizard: ~"^addr:.$" ~"." and place=
this has to be the result after wizard:
Is this all correct?
And how stable is this new feature in overpass-turbo and overpass-API itself?
Cab I make an announcement in weekly OSM news "Wochennotiz"?
After some tweaking the query in Wizard mode, I was able to find all OSM objects with addr:/=/ and place=* :
enter in wizard: ~"^addr:./$" ~"." and place=/
this has to be the result after wizard:
Is this all correct?
In the end it should be
[~"^addr:.*$"~"."][place]
And how stable is this new feature in overpass-turbo and overpass-API itself?
The feature is currently in a feature branch (called "regex_on_keys"). The syntax will remain as it is. However, it is not yet fast. The second restriction is that negation doesn't work at the moment, or searching for the key only or searching for a particular value. I plan to add negation quite soon, but the other features are likely to be postponed.
Cab I make an announcement in weekly OSM news "Wochennotiz"?
Yes, please. I think the best is at the moment to have examples like
http://overpass-turbo.eu/s/4YQ
and
http://overpass-turbo.eu/s/4YR
to explain what works.
The second restriction is that negation doesn't work at the moment, or searching for the key only or searching for a particular value. I plan to add negation quite soon, but the other features are likely to be postponed.
Pardon a bit of necroposting. Did negation go in at some point? Attempting to use it currently fails with "regular expressions on keys cannot be combined with negation".
Edit: ah, that seems to be what #589 is about, right?
way[!~"."~"."]
, whereas you're probably looking for way[~"."!~"."]
. Both variants aren't available in this repo as of today.By the way, I've reactivated the overpass link mentioned in #589, where you can try out both variants.
Conditional restrictions and similar constructs create a plethora of different tagging alternatives which are quite difficult to catch via normal Overpass API calls.
Example: http://overpass-turbo.eu/s/1jl
I'd appreciate a way to specifiy all these alternatives via a compact regex expression instead of an ever growing (and never really complete) list.
Is that technically feasible with today's data model?
Best, mmd