drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
695 stars 90 forks source link

Support regex expression also for keys #59

Closed mmd-osm closed 7 years ago

mmd-osm commented 10 years ago

Conditional restrictions and similar constructs create a plethora of different tagging alternatives which are quite difficult to catch via normal Overpass API calls.

Example: http://overpass-turbo.eu/s/1jl

I'd appreciate a way to specifiy all these alternatives via a compact regex expression instead of an ever growing (and never really complete) list.

Is that technically feasible with today's data model?

Best, mmd

drolbr commented 10 years ago

For performance reasons, I would consider an additional database mapping.

Nowadays, the tags are essentially saved as a very long sequence of the form (key value id+)* in alphabetical order on keys, then values.

Because an arbitrary regular expression doesn't allow to derive an alphabetical range (or few ranges), the full file of several GB size must be read for each sequence. That's too slow to be useful.

To overcome this restriction, a simple plain list of keys could be maintained to check what keys would fit the regular expression. For reasonable regular expressions this works quite fast: there are about 50.000 different keys, on average about 20 bytes long. To read a megabyte of data doesn't hurt.

However, such a database extension makes sense to be done along with other database extensions, so this will become available together with the history feature (which is a huge database extension in itself).

daganzdaanda commented 10 years ago

Hi! I'd also be interested in using regex or any kind of wildcard to look for certain groups of keys. It would be helpful for quality assurance, IMHO. addr:* comes to mind, or note:, wikipedia:,... Also, regex could be used to find wrongly spelled keys easily.

daganzdaanda commented 10 years ago

I've been shown that JOSM can filter keys with regex: "addr:."="." gives everything that has a key that starts with "addr:" But of course JOSM only works with the data that has been downloaded, so it's never too much.

The list of keys that @drolbr suggests sounds like a good thing for most use cases. But what if you are looking for unknown / undocumented keys or typos -- would a list be any help then?

And in case the list will take too long to implement, how about a strict time-out for regex-queries like that? Or maybe a limit for the area that is being queried?

pbrewczynski commented 9 years ago

I would like to "vote" for this issue, as I also need the multiple keys search.

drolbr commented 9 years ago

I would like to "vote" for this issue, as I also need the multiple keys search.

A first version of the feature is already online since some days: http://overpass-turbo.eu/s/4Ox Please use the tilde in front of the key term to make it search for a regular expression:

 [~KEY_REGULAR_EXPRESSION~VALUE_REGULAR_EXPRESSION]

Some restrictions so far:

These are the reasons why the feature has not been announced publicly yet.

pbrewczynski commented 9 years ago

@drolbr And also, regular expression JUST for key is not allowed (It have to be paired with regular expression for value)

(Alone regular expression on JUST the key is what I'm looking)

mmd-osm commented 9 years ago

@bluesm: Is it just any value (value doesn't matter) or rather some specific value you want to specify in the query? Can you provide some examples?

pbrewczynski commented 9 years ago

Values doesn't matter, And i serach for multiple tags in the polygon. 

(Now I'm getting all nodes and ways, and parse it on client side, which kind of suck, especially with a lot of data$ — Sent from Mailbox

On Tue, Sep 2, 2014 at 6:13 PM, mmd notifications@github.com wrote:

@bluesm: Is it just any value (value doesn't matter) or rather some specific value you want to specify in the query? Can you provide some examples?

Reply to this email directly or view it on GitHub: https://github.com/drolbr/Overpass-API/issues/59#issuecomment-54176614

pbrewczynski commented 9 years ago

@mmd-osm

ypid commented 9 years ago

Hi

This feature comes also in handy for my use case which is to find all objects which have no opening_hours tag but could probably be added easily based on infos on there website.

I tried it both ways:

Without regular expressions for the key:

[out:json][bbox:{{bbox}}];
(   /* tag website is not so often used as amenity */
    node["website"];
    node["contact:website"];
    node["opening_hours:url"];
    way["website"];
    way["contact:website"];
    way["opening_hours:url"];
)->.req_info;
(
    node.req_info["amenity"];
    node.req_info["shop"];
    way.req_info["amenity"];
    way.req_info["shop"];
)->.facilities;
(
    node.facilities["opening_hours"!~"."];
    way.facilities["opening_hours"!~"."];
)->.not_oh_value;
.not_oh_value out body center;

See overpass-turbo, request took 3.844s (3.118s the second time).

And with regular expressions for the key:

[out:json][bbox:{{bbox}}];
(
    node[~"^(website|contact:website|opening_hours:url)$"~"."][~"^(amenity|shop)$"~"."]["opening_hours"!~"."];
    way[~"^(website|contact:website|opening_hours:url)$"~"."][~"^(amenity|shop)$"~"."]["opening_hours"!~"."];
);
out body center;

See overpass-turbo, request took 2.220s (0.978s the second time).

So it works great so far. Any recommendation what I should use?

daganzdaanda commented 9 years ago

Thank you! It works really well, IMHO. Is it already supported in the wizard?

@bluesm If values don't matter, why not use a "." as value?

tyrasd commented 9 years ago

Is it already supported in the wizard?

It is now (see commit above). It's not live yet, but you can test it out here. The syntax is similar to the QL version, e.g. ~"^name" ~ "…".

daganzdaanda commented 9 years ago

Nice! Thank you!

stephan75 commented 9 years ago

Wow! very nice feature!

After some tweaking the query in Wizard mode, I was able to find all OSM objects with addr:= and place=* :

enter in wizard: ~"^addr:.$" ~"." and place=

this has to be the result after wizard:

Is this all correct?

And how stable is this new feature in overpass-turbo and overpass-API itself?

Cab I make an announcement in weekly OSM news "Wochennotiz"?

drolbr commented 9 years ago

After some tweaking the query in Wizard mode, I was able to find all OSM objects with addr:/=/ and place=* :

enter in wizard: ~"^addr:./$" ~"." and place=/

this has to be the result after wizard:

Is this all correct?

In the end it should be

[~"^addr:.*$"~"."][place]

And how stable is this new feature in overpass-turbo and overpass-API itself?

The feature is currently in a feature branch (called "regex_on_keys"). The syntax will remain as it is. However, it is not yet fast. The second restriction is that negation doesn't work at the moment, or searching for the key only or searching for a particular value. I plan to add negation quite soon, but the other features are likely to be postponed.

Cab I make an announcement in weekly OSM news "Wochennotiz"?

Yes, please. I think the best is at the moment to have examples like

http://overpass-turbo.eu/s/4YQ

and

http://overpass-turbo.eu/s/4YR

to explain what works.

richlv commented 1 year ago

The second restriction is that negation doesn't work at the moment, or searching for the key only or searching for a particular value. I plan to add negation quite soon, but the other features are likely to be postponed.

Pardon a bit of necroposting. Did negation go in at some point? Attempting to use it currently fails with "regular expressions on keys cannot be combined with negation".

Edit: ah, that seems to be what #589 is about, right?

mmd-osm commented 1 year ago

589 is about way[!~"."~"."], whereas you're probably looking for way[~"."!~"."]. Both variants aren't available in this repo as of today.

By the way, I've reactivated the overpass link mentioned in #589, where you can try out both variants.